Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR IDENTIFYING A SPEAKER IN A CONFERENCING SYSTEM
Document Type and Number:
WIPO Patent Application WO/2004/014054
Kind Code:
A1
Abstract:
A method and apparatus are provided for identifying a speaker from among a plurality of participants in a meeting management system. The method comprises the steps of monitoring an audio signal associated with each of the participants; identifying one of said audio signals having a maximum signal strength; and identifying a speaker based on the maximum signal strength. The identifying step can optionally ensure that a signal has a maximum signal strength for at least a minimum predefined duration. The identified speaker can be used, for example, to tag audio events associated with the meeting.

Inventors:
FUJISAKI TETSUNOSUKE (US)
DAIJAVAD SHAHROKH (US)
Application Number:
PCT/US2003/023193
Publication Date:
February 12, 2004
Filing Date:
July 25, 2003
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
COLLABO TECHNOLOGY INC (JP)
FUJISAKI TETSUNOSUKE (US)
DAIJAVAD SHAHROKH (US)
International Classes:
H04L12/18; H04M3/56; (IPC1-7): H04M3/56; H04L12/18
Domestic Patent References:
WO2001078365A22001-10-18
WO2002028075A22002-04-04
Foreign References:
US6304648B12001-10-16
US5768263A1998-06-16
FR2799914A12001-04-20
Attorney, Agent or Firm:
Mason, Kevin M. (Mason & Lewis LLP, Suite 205, 1300 Post Roa, Fairfield CT, US)
Download PDF:
Claims:
We claim:
1. A method for identifying a speaker in a meeting having plurality of participants, comprising the steps of: monitoring an audio signal associated with each of said participants; identifying one of said audio signals having a maximum signal strength; and identifying a speaker based on said maximum signal strength.
2. The method of claim 1, wherein said identifying step ensures that a signal has a maximum signal strength for at least a minimum predefined duration.
3. The method of claim 1, wherein said maximum signal strength is a maximum volume.
4. The method of claim 1, wherein said maximum signal strength is a maximum energy intensity.
5. The method of claim 1, wherein at least one of said participants participates by means of a telephone connection.
6. The method of claim 1, wherein said identified speaker is used to tag audio events associated with said meeting.
7. The method of claim 1, further comprising the step of broadcasting said audio signal to said participants with an indication of the identified speaker.
8. The method of claim 1, further comprising the step of registering said participants.
9. The method of claim 8, wherein said registration step further comprises the step of associating each of said participants with a corresponding audio signal source.
10. A system for identifying a speaker in a meeting having plurality of participants, comprising: a memory; and at least one processor, coupled to the memory, operative to: monitor an audio signal associated with each of said participants; identify one of said audio signals having a maximum signal strength; and identify a speaker based on said maximum signal strength.
11. The system of claim 10, wherein said processor is further configured to ensure that a signal has a maximum signal strength for at least a minimum predefined duration.
12. The system of claim 10, wherein said maximum signal strength is a maximum volume.
13. The system of claim 10, wherein said maximum signal strength is a maximum energy intensity.
14. The system of claim 10, wherein at least one of said participants participates by means of a telephone connection.
15. The system of claim 10, wherein said identified speaker is used to tag audio events associated with said meeting.
16. The system of claim 10, wherein said processor is further configured to broadcast said audio signal to said participants with an indication of the identified speaker.
17. The system of claim 10, wherein said processor is further configured to register said participants.
18. The system of claim 17, wherein said processor is further configured to associate each of said participants with a corresponding audio signal source.
Description:
METHOD AND APPARATUS FOR IDENTIFYING A SPEAKER IN A CONFERENCING SYSTEM

Cross Reference to Related Application This application claims the benefit of United States Provisional Application 60/400,746, filed August 2,2002. This application is related to PCT Patent Application entitled"Method and Apparatus for Processing Image-Based Events in a Meeting Management System, " (Attorney Docket No. 1008-3) filed contemporaneously herewith.

Field of the Invention The present invention relates generally to project management systems and, more particularly, to project management systems that facilitate the synchronous interaction of a number of individuals to create and modify documents and to perform other project tasks.

Background of the Invention Project management systems increase productivity and efficiency of members of a project team by automating the flow of information, including documents and files, among team members. Project management systems are often deployed to support collaborative work among a group of individuals, such as the members of a project team. Asynchronous collaboration systems allow team members to collaborate on one or more project tasks independently in time or space. Synchronous collaboration systems, on the other hand, allow team members to simultaneously collaborate on one or more project tasks in the same or a different location.

As the employees of an enterprise become more distributed in time and place, for example, due to flexible work hours, globalization and the distribution of enterprise employees to avoid the destruction of a centralized enterprise location, it becomes even more important to provide team members with an effective tool for asynchronous and synchronous collaboration. In today's enterprise environment, it is important for a project management system to permit distributed team members to initiate ad-hoc virtual meetings, for example, over the Internet. Generally, such project

management systems must allow distributed team members to communicate and interact as if the team members were in the same place.

When there are multiple participants in a meeting, it is often difficult to automatically detect who is currently speaker. Speaker recognition systems exist that can identify a speaker from among a number of registered participants. Speaker recognition systems, however, are computationally expensive, especially in a distributed meeting environment. A need therefore exists for a meeting management system and method that provide improved techniques for identifying a speaker from among a plurality of participants.

Summary of the Invention The present invention provides a project management system that allows one or more team members to work on a project. Generally, a method and apparatus are provided for identifying a speaker from among a plurality of participants in a meeting management system. The method comprises the steps of monitoring an audio signal associated with each of the participants; identifying one of said audio signals having a maximum signal strength; and identifying a speaker based on the maximum signal strength. The identifying step can optionally ensure that a signal has a maximum signal strength for at least a minimum predefined duration. The identified speaker can be used, for example, to tag audio events associated with the meeting.

A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.

Brief Description of the Drawings FIG. 1 illustrates the network environment 100 in which the present invention can operate; FIG. 2 is a schematic block diagram of an exemplary participant terminal of FIG. 1; FIG. 3 is a schematic block diagram of an exemplary meeting management system of FIG. 1;

FIG. 4 illustrates an exemplary meeting management interface 400 for managing past, present and future meetings that incorporates features of the present invention ; FIG. 5 illustrates an exemplary session interface that allows a user to participate in an ongoing meeting; FIG. 6 illustrates an exemplary meeting review interface that incorporates features of the present invention that allows a user to review past meetings; FIG. 7 is a flow chart describing an exemplary speaker detection process incorporating features of the present invention; FIG. 8 illustrates an exemplary session interface that allows a user to share images of an application according to the present invention; and FIG. 9 is a flow chart describing an exemplary application sharing process incorporating features of the present invention.

Detailed Description FIG. 1 illustrates the network environment 100 in which the present invention can operate. As shown in FIG. 1, one or more meeting participants, each employing a participant terminal 200-1 through 200-N (hereinafter, collectively referred to as participant terminals 200 and discussed further below in conjunction with FIG. 2), are connected to a network 120. The meeting participants may be, for example, members of a project team. The network 120 may be embodied, for example, as any wired or wireless network, or a combination of the foregoing, including the Public Switched Telephone Network (PSTN), the cellular telephone network or the Internet.

According to one aspect of the invention, a meeting management system 300, as discussed further below in conjunction with FIG. 3, is provided to allow two or more participants to participate in a virtual meeting, or to allow one or more participants to review a previous meeting. In addition, the meeting management system 300 allows one or more meeting participants to establish an agenda for a meeting, to assign action items during a meeting, and a built-in teleconferencing component that automatically initiates an audio meeting among meeting participants. The data-sharing interactions in meetings and the synchronized audio streams associated with them are recorded and are available asynchronously for playback or exporting to new meetings (or both).

FIG. 2 is a schematic block diagram of an exemplary participant terminal 200. As shown in FIG. 2, the participant terminal 200 includes a personal computer <BR> <BR> (comprising, e. g. , a memory and processor), as well as one or more of a screen projector 1195, a screen overlay tablet 1105, a speaker 1185, a camera 1175 for capturing images, a microphone 1165 for capturing audio, a memory 1140 for storing documents, a user interface 1120 (such as one or more of a pen, keyboard or mouse). Components 1130, 1150,1160 and 1170 transform signals as indicated by their corresponding function, in a known manner.

The screen overlay tablet 1105 captures x-y coordinate movements of a pen over a screen. According to a feature of the present invention, discussed further below, the coordinates of a given modification event that changes a document are recorded as part of the event. The coordinates of the pen markups on the screen overlay tablet 1105 are captured and transformed to an appropriate format by converter 1130. If a tablet 1105 is not employed, a mouse can simulate the pen and tablet, as would be apparent to a person of ordinary skill in the art. Input from a keyboard 1120 as a markup text on the screen is also captured and passed to converter 1130. Converter 1130 converts the series of inputs and provides them to a central service component 1200 in the meeting management system 300, via the network 120, for recording and propagation to other participants, if appropriate, as discussed further below.

All signals from the microphone 1165 and camera 1175 are processed by components 1160 and 1170, respectively, and then sent, for example, in a compressed format to the central service component 1200 in the meeting management system 300 for recording and propagation to other participants, if appropriate. Documents in the document storage 1140 can be converted to a compressed bit map by component 1150 and sent to the central service component 1200 for recording and propagation, as discussed further below in conjunction with FIGS. 8 and 9.

The main screen projected by projector 1195 receives propagation data from other participants from a sound board 1220 of the central service component 1200 in the meeting management system 300. For a detailed description of an exemplary sound board 1220, see PCT Patent Application Serial Number PCT/US 03/09876, entitled "Method and Apparatus for Synchronous Project Collaboration,"filed March 31,2003, and incorporated by reference. Audio signals are directed to the speaker 1185 and image

data are passed to the projector 1195 for screen projection. In one implementation, an automatic log-in component 1180 is automatically started when the participant terminal 200 is powered up. The automatic log-in component 1180 is programmed with the address of a directory system 1300 in the meeting management system 300 and performs an automatic log in procedure with the directory system 1300, in a lcnown manner.

FIG. 3 is a schematic block diagram of an exemplary meeting management system 300. As shown in FIG. 3, the meeting management system 300 includes a central service component 1200 and a directory system 1300. In addition, the directory system 1300 provides a user management system in a remote configuration. The central service component 1200 is connected to the participant terminal 200 of each participant (or location) via the network 120. The meeting management system 300 may be embodied as one or more distributed or local servers. For example, the meeting management system 300 may include the AG2000 or AG4000 (audio conferencing and recording... data event recording is a proprietary server) communication servers commercially available from NMS Communications of Framingham, MA.

According to one aspect of the invention, shown in FIG. 3 and discussed further below, the central service component 1200 includes an event recorder 1205. The event recorder 1205 includes a time stamper 1210 and an event tagger 1215. As previously indicated, the present invention records each event in a meeting, such as a document modification, together with a time stamp indicating when the event occurred and a tag that annotates the event with information on the nature of the event, and the participants associated with the event. For example, for an overlay modification to a document, the event would identify the base document, overlay text, coordinates of the overlay and the participant who provided the overlay text. In this manner, the event recorder 1205 adds an ownership flag and time stamp to all images, audio, pen mark-ups, text mark-ups or other data and events received from a participant terminal 200 and captured as the result of activities of a local or remote participant.

Thus, the present invention provides for time stamping and event tagging of each event recorded as part of a meeting. For example, the ownership flag associated with a pen markup data will be assigned based on the identification of the participant associated with the participant terminal 200 that has logged in to the directory system 1300. In this manner, the event recorder 1205 allows a number of different indexes to be created that

will enhance the ability to replay and retrieve various meetings of interest. For example, the buddy list information indicating the current participants provides an indication of participants that come and go during a meeting. In addition, the information provided by components 1130,1150, 1160 and 1170 permit indexing of the document mark ups, page <BR> <BR> changes, speaker identity, and meeting content (e. g. , what did a participant say at any point in the meeting).

The time-stamped data is stored in a meeting calendar and room repository 1230. The meeting calendar and room repository 1230 is essentially a calendar system that registers past and future meetings and holds meeting spaces 1240. For each meeting space 1240, meeting attributes, such as starting time, ending time, participants, presentations, conclusions, audio conversations, pictures of participants, mark-ups on presentations, and other data relating to a meeting are defined.

The time-stamped data is also sent to the sound board 1220, described in PCT Patent Application Serial Number PCT/US 03/09876, entitled"Method and Apparatus for Synchronous Project Collaboration,"filed March 31,2003, and incorporated by reference. Generally, the sound board 1220 makes actions by one team member visible to another team member. In other words, the sound board 1220 propagates all data associated with a given meeting to all registered participants.

The sound board 1220 intercepts an incremental change (addition or modification) to a base document of one team member and broadcasts such intercepted traffic to all other active client agents of other active team members (and also records such intercepted traffic in an addendum database). Thus, all the team members in a synchronous session will share changes to the documents by sharing addendum additions in real time. The manner in which the sound board 1220 serializes the various modification requests made by each team member and ensures that each team member is presented with a consistent view of the shared document is discussed in PCT Patent Application Serial Number PCT/US 03/09876.

The sound board 1220 consists of a serializer and a broadcaster. Each user can submit conflicting change requests for an object spontaneously and concurrently. For example, a first user might request that an object is moved to the left while another user might request that the same object is moved to the right. The serializer receives each of the change requests and serializes them, for example, based on an arrival time or a global

clock. Serialized requests are then sent to the broadcaster which broadcasts the requests to all users. The change requests can be broadcast to all currently active users in real-time, <BR> <BR> and can be stored in the meeting repository 1230 for subsequent access, e. g. , by any late arriving users, as would be apparent to a person of ordinary skill in the art.

Thus, one participant's action will be replicated on the corresponding participant terminal 200 of other participants. A connection from the sound board 1220 to each participant terminal 200 has a first in first out (FIFO) queue for every connection. In other words, if a late arriving participant terminal 200 connects after a given meeting has begun, all data up to the point when the a late arriving user joins the meeting is stored in the FIFO queue and sent to the participant terminal 200 in the appropriate order. Thus, a late arriving participant terminal 200 will not miss anything.

The directory system 1300 includes a directory 1310 of all potential meeting participants (and the corresponding participant terminal 200). In addition, the directory system 1300 maintains a table 1320 indicating the participants that are currently connected and available. The meeting participants identified in table 1320 are generally a subset of the people identified in the directory 1310. The directory system 1300 also includes a display 1330 indicating the active user list 1320 and a user directory management controller for management purposes. The information presented on display 1330 can also be superimposed on part of the main screen as projected by projector 1195 on the screen 1105 associated with each client interface 1120,1105. For example, the presentation of the directory of available people in the system 1320 on the user display 1105 provides a buddy list.

Each user or participant should be registered with the directory 1310, for example, using a manual process or automated presence detection techniques. The directory 1310 can be managed by meeting participants with an appropriate privilege level, in a known manner.

Capturing Participant Activities Voice and other audio signals are captured by a microphone or a telephone handset 1165. In the case of a microphone, the signal can be digitized and compressed by an optional voice compression component 1160 (optional, esp. for microphone). In the case of a handset 1165, the signal can be phone signal sent to the sound board 1220 for digitizing and compression. The digitized, compressed audio signal is then sent to the

event recorder 1205 for annotation, in a manner discussed further below. The digitized, compressed audio signal is also passed to the sound board 1220 for broadcasting to the participant terminals 200 corresponding to each participant by the speaker 1185. At the same time, the annotated audio signal is stored in the appropriate meeting room 1240 for recording.

Images and video of, e. g. , objects, documents and faces at each participant terminal 200 are optionally captured by video camera 1175. The captured images are processed by component 1170 and sent to the event recorder 1205 for annotation. The digitized, compressed images are also passed to the sound board 1220 for broadcasting to the participant terminals 200 corresponding to each participant. At the same time, the annotated image signal is stored in the appropriate meeting room 1240 for recording.

According to one aspect of the present invention, each image is recorded for subsequent playback, as well as all image processing commands, such as user commands to rotate, zoom, modify or manipulate an image.

Keyboard, mouse and stylus movements are captured by converter 1130 and are converted to overlay mark-ups. The markups are sent to the event recorder 1205 for annotation. The digitized, compressed mark-ups are also passed to the sound board 1220 for broadcasting to the participant terminals 200 corresponding to each participant.

At the same time, the annotated mark-ups are stored in the appropriate meeting room 1240 for recording.

Meeting Management Interface FIG. 4 illustrates an exemplary meeting management interface 400 for managing past, present and future meetings that incorporates features of the present invention. The meeting management interface 400 can be employed to create meetings, and to define associated agendas and participant lists. The exemplary management interface 400 includes a section 410 that provides a mechanism for selecting a meeting, for example, using a keyword search. A second section 420 indicates all meetings that satisfy any search criteria or filtering information that was entered in section 410, with each row corresponding to a different meeting. The summary information provided in section 420 may identify, for example, the name, project, organizer and start time associated with the corresponding meeting.

A third section 430 shows the details of one particular meeting selected from the meeting list 420. The exemplary meeting details presented in section 430 may indicate, for example, the meeting organizer, status (open or closed), creation date, last update date, a list of various sessions included in the meeting, a list of meeting <BR> <BR> participants, a meeting description and meeting contents (e. g. , documents, video and audio).

As shown in FIG. 4, the exemplary meeting management interface 400 also includes a tool bar 440 that allows a participant to join, review, empty, export, import, close, edit or delete a selected meeting. If the participant clicks on the"join"icon, the participant will then be presented with a session interface 500 that coordinates an active (real-time) meeting, as discussed further below in conjunction with FIG. 5. Similarly, if the participant clicks on the"review"icon, the participant will then be presented with a meeting review interface 600 that allows a user to review a prior meeting, as discussed further below in conjunction with FIG. 6. The export icon allows the contents of a prior meeting to be provided to a new meeting. The close icon prevents additional information from being added to an existing meeting. The edit icon allows properties of a meeting to be modified. The delete icon removes the meeting from the meeting repository 1230.

FIG. 5 illustrates an exemplary session interface 500 that allows a user to participate in an ongoing meeting. As shown in FIG. 5, the exemplary session interface 500 includes three tabs 510,520, 530 that can be selected to access various functions of the interface 500. As presented in FIG. 5, the presentation tab is selected and presents the

participants with the primary presentation information, such as mark-ups or overlays on a document. The action tab 520 allows a participant to edit or define action items associated with the meeting. The agenda tab 530 allows the user to edit or define agenda items associated with the meeting. The session interface 500 also includes a window 540 for identifying any components or content associated with the meeting, such as images. A selected content item from window 540 will automatically be presented in the presentation window 510.

A meeting participant window 550 identifies all of the participants who have been defined for the current meeting. Individual names in the meeting participant window 550 can optionally be highlighted to indicate those participants that have joined the meeting. If an active participant clicks on a phone icon 560, all other participants who have logged into the directory system 1300 will automatically be contacted by telephone and invited to join the meeting. In one implementation, a participant database is maintained that includes a current telephone number for each participant. The participant database is accessed to retrieve the telephone number of each participant for a given meeting and an automated dialing system is employed to include the various participants in the meeting by automatically dialing the telephone numbers and providing a bridge. The participant database may also record a corporate affiliation, address, and role (e. g., administrator, manager or user) for each participant.

According to a further aspect of the present invention, whenever audio information is associated with a meeting, such as two participants speaking at the same location or via a telephone conference established among participants, an additional connection is optionally provided to an audio recorder for recording the audio component of the meeting. This recorder records the mixed or combined audio of all participants at the sound board ##.

FIG. 6 illustrates an exemplary meeting review interface 600 incorporating features of the present invention that allows a user to review past meetings. As shown in FIG. 6, the exemplary meeting review interface 600 includes a first section 610 that provides a coarse time scale, such as a two hour window in the exemplary embodiment.

In an automatic replay mode, the locator moves from left to the right in normal speed. If a user grabs and moves the locator to a desired location, the user can randomly access any portion of the meeting. The coarse time scale 610 allows a participant to select a desired

portion of a selected meeting. A second section 620 provides a fine time scale, such as a 15 minute window in the exemplary embodiment, indicating each of the events in the selected 15 minute window. Events may include, for example, participant X joined, participant X left, page N of Presentation Y is shown, participant X wrote markups on presentation Y, and participant X inserted a bookmark. In one implementation, events are categorized into four exemplary categories: action, agenda, presentation and people, each identified by a corresponding icon. A third section 630 of the interface 600 allows a user to selectively include or exclude each category of event. In the window 640, a number of selected action items are presented. Area 650 is a presentation board that presents the display content (such as images or overlays) associated with the meeting in a replay mode, synchronized with the overall meeting presentation.

FIG. 7 illustrates a speaker detection process 700 incorporating features of the present invention. As shown in FIG. 7, the speaker detection process 700 receives the audio signals from each of the N active participants in a meeting, in a manner discussed above. In an exemplary implementation, the speaker detection process 700 continuously <BR> <BR> monitors the signal strength (e. g. , volume) of each of the N participants during step 710.<BR> <P>A test is performed during step 720 to determine if the dominant channel changes (e. g. , a different channel having the highest volume level for at least a predefined minimum time interval). If it is determined that there has been a change in the dominant channel, a speaker change is identified during step 720 and the participant associated with the dominant channel is thereafter used during step 730 for tagging each audio event. The name or user identifier of the speaker can be obtained, for example, from registration information that indicates a given speaker is associated with a given channel.

Another aspect of the present invention allows images of user applications to be shared. In one implementation, images of a single window containing an application or an entire desktop can be shared. The application sharing function provides a solution that captures any application running on the client as an image to be broadcast to all participants in the meeting. Image sharing allows a meeting participant to take a "snapshot"of an application running on their machine and broadcast it to the other participants in a meeting. This snapshot is a one-time event and must be repeated if updates are made to the presentation.

In one implementation, the application image sharing function of the present invention is initiated using one or more predefined"hot keys."A first hot key sequence, such as shift-control-c, can capture an image of a single window containing an application and a second hot key sequence, such as shift-control-d, can capture an image of the entire desktop. The specific hot key sequences can be predefined or configured by each user.

To share an image of an application, the mouse pointer should be over the application (or desktop) to be shared. The user types the hot key sequence (such as shift- ctrl-c). The selected application is brought to the foreground and the image is taken and uploaded to the meeting management system 300. A session interface 800, discussed below in conjunction with FIG. 8, is made active and the image is loaded into the panel.

The manner in which the application image is obtained and shared is discussed further below in conjunction with FIG. 9.

FIG. 8 illustrates an exemplary session interface 800 that allows a user to share an image of an application. As shown in FIG. 8, the exemplary session interface 800 is an extension of the session interface 500 discussed above in conjunction with FIG.

5. In addition, when configured for application image sharing, the exemplary session interface 800 includes a presentation snapshot window 810 that allows a participant to control the portion of the shared image that is presented in a presentation window 820. By adjusting a rectangular box within the presentation snapshot window 810, the user can pan through a desired portion of the entire image that is presented in window 820. In addition, the exemplary session interface 800 includes a set of icons 830 to activate a number of well known image processing functions, such as zoom in, zoom out, rotate, fit-to-page and crop. As previously indicated, whenever a user activates an image processing function, the resulting image manipulation is recorded as an event. The exemplary shared application shown in FIG. 8 is a browser presenting a web page from Collabo- Technology, Inc.

FIG. 9 is a flow chart describing an exemplary application image sharing process 900 incorporating features of the present invention. As shown in FIG. 9, the application image sharing process 900 is initiated during step 910 when a predefined hot key sequence is detected (such as shift-ctrl-c or shift-ctrl-d). Once a predefined hot key sequence is detected a library function, for example, from a Java library, is invoked to

create image of the application within the selected window (if the desktop hot key sequence is detected) or of the desktop (if the desktop hot key sequence is detected) during step 920. In the case of a shared application, the image capture function uses a boundary of a window containing the application. In the case of a shared desktop, the image capture function uses the size of the user's display as the boundary.

Thereafter, the captured image is optionally converted to a bit map and then to a graphics file, such as a portable network graphics (png) file, during step 930.

Thereafter, the application image is annotated (i. e. , the event is tagged with the appropriate time stamp, as well as information on the nature of the event, and the participant that invoked the application sharing), broadcast to all participants and recorded in the meeting repository 1230. Thereafter, the shared application image will be presented in the session interface 800 (FIG. 8) of each active participant.

It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention.