Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR SCALABLE MEDIA SWITCHING CONFERENCING
Document Type and Number:
WIPO Patent Application WO/2011/149359
Kind Code:
A1
Abstract:
A system and a method for exchanging audio, video, and data information between at least two clients in a communication network supported by a central unit. The method comprises the steps of connecting the at least two clients to the central unit using a call control protocol, the call control protocol negotiating video formats and connection information for sending and receiving media streams; transmitting information from a first client to the central unit, the information comprising meta-data describing different media streams the first client is capable of transmitting; transmitting the information received from the first client to the at least one other client; on receiving the information on the different media streams the first client is capable of transmitting, the at least one other client is deciding on which of the available media streams from the first client the at least one other client will subscribe to; transmitting a subscribe message from the at least one other client to the central unit, subscribing to at least one available media stream from the first client; on receiving at least one subscribe message from the at least one other client, the central unit transmits a message requesting the first client to start transmitting media streams subscribed to by the at least one other client; transmitting the media streams subscribed to by the at least one other client from the first client to the central unit; and transmitting the media streams subscribed to by the at least one other client from the central unit to the at least one other client.

Inventors:
BERGER ESPEN (NO)
BUEHLER PASCAL (NO)
KROKNES JAN ASLE (NO)
Application Number:
PCT/NO2011/000158
Publication Date:
December 01, 2011
Filing Date:
May 25, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TANDBERG TELECOM AS (NO)
BERGER ESPEN (NO)
BUEHLER PASCAL (NO)
KROKNES JAN ASLE (NO)
International Classes:
H04L29/06; H04N7/15
Domestic Patent References:
WO2005048600A12005-05-26
Foreign References:
EP1578129A12005-09-21
US20090207988A12009-08-20
JP2000050225A2000-02-18
US7561179B22009-07-14
EP1683356A12006-07-26
Attorney, Agent or Firm:
ONSAGERS AS et al. (Oslo, NO)
Download PDF:
Claims:
CLAIMS

1. Method for exchanging audio, video, and data information between at least two clients in a communication network supported by a central unit, wherein the method comprises the steps of connecting the at least two clients to the central unit using a call control protocol, the call control protocol negotiating video formats and connection information for sending and receiving media streams; receiving, at the central unit, information from a first client, the information comprising meta-data describing different media streams the first client is capable of transmitting; transmitting the information received from the first client to the at least one other client; receiving, at the central unit, a subscribe message from the at least one other client subscribing to at least one available media stream from the first client; on receiving at least one subscribe message from the at least one other client, transmitting, by the central unit, a message requesting the first client to start transmitting media streams subscribed to by the at least one other client; receiving, at the central unit, the media streams subscribed to by the at least one other client from the first client; and transmitting, by the central unit, the media streams subscribed to by the at least one other client to the at least one other client. 2. Method according to claim 1 , further comprising

deciding, by the at least one other client, on receiving the information on the different media streams that the first client is capable of transmitting, which of the available media streams from the first client the at least one other client will subscribe to.

3. Method according to claim 1 or 2, wherein the meta-data describing the different media streams the first client is capable of transmitting comprises at least one of bandwidth threshold and video resolution.

4. Method according to claim 1 or 2, wherein the different media streams that the first client is capable of transmitting comprises data provided by at least one of a presenter camera, an audience camera and a document camera.

5. Method according to claim 2, wherein the step of deciding which of the available media streams from the first client the at least one other client will subscribe to, is based on at least one of processing power of the client, bandwidth restrictions, and layout of a client display.

6. Method according to claim 1 or 2, wherein the method further comprises the steps of:

by the central unit, automatically requesting connected clients to transmit at least one audio stream,

by the central unit, automatically transmitting all received audio streams to all connected clients.

7. Method according to claims 2 or 5, wherein the at least one other client upon a change in layout of an client display or a bandwidth restriction is deciding to subscribe to at least one different available media stream from the first client, the method further comprises the step of receiving, by the central unit, a unsubscribe message from the at least one other client, unsubscribing to at least one of the media streams subscribed to by the at least one other client; receiving, by the central unit, a subscribe message from the at least one other client, subscribing to at least one other available media stream from the first client; and on receiving the subscribe and unsubscribe messages from the at least one other client, transmitting, by the central unit, a message requesting the first client to start transmitting media streams subscribed to by the at least one other client, and to stop transmitting media streams unsubscribed to by the at least one other client. 8. Method according to claim 1 or 2, wherein the method further comprises mixing, by the at least one other client, the received subscribed media streams locally.

9. Method according to claim 1 or 2, wherein the call control protocol is one of SIP, H.323 and Jingle.

10. Method according to one of the claims 1-9, wherein the central unit is a media switching conference server and each client is a video conference endpoint

1 1. A communication system, comprising at least two clients and a central unit, the clients being interconnected in a communication network supported by the central unit, the system being configured to perform a method as set forth in one of the claims 1 -9.

12. A communication system according to claim 1 1 , wherein the central unit is a media switching conference server and each client is a video conference endpoint.

13. A media switching conference server, configured to perform a method as set forth in claim 1 , the server being the central unit.

Description:
SYSTEM AND METHOD FOR SCALABLE MEDIA SWITCHING

CONFERENCING

TECHNICAL FIELD

The present invention relates to video conferencing and in particular to a system and a method for scalable media switching video conferencing.

BACKGROUND

Traditional multi party videoconferences use a push model for sending video and audio to clients. The traditional approach using a transcoding Multipoint Control Unit (MCU), implements audio mixing, video layout composition and conference control entirely on the centralized transcoding MCU. Using this approach the MCU must implement the user experience rules such as generating the layouts seen by each individual user. Using a push model makes it difficult for the clients to override the server. The video layout composition generally includes decoding of each incoming stream, mixing the video layout composition for each of the participating clients, and encoding the mixed outgoing streams. This generally introduces unwanted delays (latency) in the communication between participating clients.

US 7561 179/EP1683356 discloses a system and method using a non-transcoding MCU or switching MCU, wherein the non-transcoding MCU receives capability information from the different clients participating in a multi-party

videoconference. Based on the received capability information, the non-transcoding MCU instructs the different clients to transmit multimedia streams comprising of partial frames adjusted to fit into the capabilities of the receiving clients

participating in the videoconference. Two main methods of transmitting multimedia streams comprising of partial frames are disclosed in the patent. One being multicasting several video streams of different quality, e.g. resolution, size etc, to the non-transcoding MCU, the other being using scalable video coding techniques such as SVC, wherein multiple levels of video quality are embedded within one stream. In both cases, the non-transcoding MCU then has the option of only passing on the partial frames to the clients that the non-transcoding MCU knows the clients can handle, based on the previously received capability information, or the non- transcoding MCU can function as a multi-cast router, passing on all received partial frames to all participating clients.

However, the use of a centralized unit to determine which video streams and/or resolution of video streams a receiving client should receive potentially limits the flexibility of user experience and user interface of a client. It is a need in the art for a system and method that allows full or at least increased flexibility in user experience, low latency, and switching of high resolution video streams without reducing quality. SUMMARY

The invention relates to a method and a system as set forth in the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to make the invention more readily understandable, the discussion that follows will refer to the accompanying drawings, wherein

Figure 1 illustrates an exemplary communication system according the present invention,

Figure 2 shows two exemplary layouts of and client display. DETAILED DESCRIPTION

In the following, the present invention will be discussed by describing preferred embodiments, and by referring to the accompanying drawings. However, people skilled in the art will realize other applications and modifications within the scope of the invention as defined in the enclosed independent claims.

For clarity of the description is a client is in the following, depending on the context of which it is described, interchangeably also referred to as an endpoint or a video conference endpoint. A client according to the present invention is typically implemented in a video conference application in a personal computer (PC), tablet computer, PDA (personal digital assistant), cell phone or similar, or as an integrated part of a standalone device.

Figure 1 is a block diagram showing an exemplary communication system

according to the present invention. The exemplary system comprises a central unit, which may be media switching conference server (MSCS) 4, and three clients, 1 , 2, 3. The MSCS and the clients communicate over a communication network, not shown, typically a packet switched network such as an IP (Internet Protocol- network. In another aspect the system may include two clients.

In yet another aspect the system may include at least two clients. In an exemplary peer-to-peer embodiment of the present invention a first client 1 (Client 1) initiates a video call to a second client (Client2) using a call control protocol. The call control protocol is preferably SIP (RFC3261), but could also be ITU-T H.323 (12/09) or Jingle (XEP-0166 2009-12-23) or any other suitable call control protocol. Using the call control the clients negotiates video formats, i.e. codecs, and connection information such as port numbers. In a SIP implementation the SIP protocol is used for call setup and SDP (Session Description Protocol defined in RFC 2327) is used for the codec and port negotiation.

Using RTCP SDES (RFC33550) messages Client 1 transmit announce (a) messages to the central unit, or MSCS, 4, the announce messages comprises meta-data describing different media streams Client 1 is capable of transmitting. Client 1 would typically announce that it is capable of transmitting video of different resolution (e.g. high, low, and medium), audio, and video from different positions or angles, such as a presenter camera, an audience camera or a document camera.

After receiving the announce message (a) from Clientl the MSCS 4 relays the announce message to other endpoints participating in the call, i.e. in this example only to Client 2. Client 2 receives the information of the different media streams Client 1 is capable of transmitting and makes a decision on which of the available media streams to subscribe to. Client 2 typically makes the decision based on processing power of Client 2, bandwidth restrictions between the client and the MSCS 4, or a layout displayed on a screen connected to Client 2.

After Client 2 has made the decision on which media streams to subscribe to, Client 2 using a RTCP APP (RFC 3550) message transmits a subscribe message using (s) to the MSCS 4 indicating which of the available media streams Client 2 wants to subscribe to. Again, the MSCS 4 relays the subscribe message from Client 2 to Client 1 requesting Clientl to start transmitting media streams subscribed to by Client 2. Clientl then starts to transmit subscribed media streams to the MSCS 4 that finally relays the media streams to Client 2.

In an exemplary multi-site embodiment of the present invention a first client 1 (Client 1) initiates a video call to a second client 2 (Client2) and a third client 3

(Client 3) as described above. Client 1 again transmits an announce (a) message to the MSCS 4, and after receiving the announce message (a) from Clientl the MSCS 4 now relays the announce message to both Client 2 and Client 3. Client 2 and Client 3 both receives the information of the different media streams Client 1 is capable of transmitting and makes decisions on which of the available media streams to subscribe to. After Client 2 and Client 3 have made the decisions on which media streams to subscribe to, both Client 2 and Client 3 transmit a subscribe message (s) to the MSCS 4 indicating which of the available media streams each of the Client 2 and Client 3, respectively, wants to subscribe to. The MSCS 4 aggregates the received subscription messages and transmits a subscribe message (s) to Client 1 requesting Client 1 to start transmitting media streams subscribed to by Client 2 and Client 3. Client 1 then starts to transmit the subscribed media streams to the MSCS 4. In case both Client 2 and Client 3 subscribe to a same media stream the MSCS 4 relays the subscribed media streams to both Client 2 and Client 3. In case Client 2 and Client 3 subscribe to different media streams, Client 1 preferably multiplexes the different media streams and transmits the multiplexed media streams to the MSCS 4 on a single port. Upon receiving the multiplexed media streams, the MSCS 4 demultiplexes the media streams and relays the media streams to the respective subscribing clients.

Although not explicitly shown in the Figure 1 , Client 2 and Client 3 are also transmitting announce messages to the MSCS 4. Client 1 , Client 2 and Client 3 decides on which of the available media streams from the other two clients to subscribe to, and transmits subscribe messages to the MSCS 4. The MSCS 4 aggregates the received subscription messages and transmits a subscribe message to each of the clients requesting the clients to start transmitting media streams subscribed to by the respective two other clients. The MSCS 4 then usually receives at least three different media streams, of which two different media streams are subscribed to by each of the respective clients (i.e. Client 2-> Client 1 and Client 3, Client 3-> Client 1 and Client 2 etc.), thus the MSCS 4 multiplexes media streams subscribed to by one client before transmitting the media streams to the client on a single port.

According to an exemplary embodiment of the present invention, the MSCS 4 is automatically requesting connected clients to transmit an audio stream, and the MSCS 4 is automatically transmitting all received audio streams to all connected clients. Thus the clients do not need to actively decide on subscribing to audio streams. This "forward to all policy" ensures lowest possible latency of forwarded audio packets. Still, a client might stop the audio stream if a microphone is muted at the client, overrunning the automatic transmission of audio is automatically transmitted from the client.

The client is responsible of mixing all incoming audio packets before played out. The client lip-syncs audio and video streams from a transmitting client using available meta-information, e.g. synchronizing audio and video packets from a client with matching RTCP SDES client names. In multi-site conferences it is common that several of the participants are silent most of the time. However, the silent participants often introduce unwanted noise into the conference, noise such as coughing, turning pages etc. that are picked up by the microphone. Also, mixing audio from non-talking participants introduces unnecessary processing load on a client. In such circumstances it would be preferable to stop the audio streams from those participants already before being transmitted to the client. The MSCS 4 can be configured to only relay the N-loudest audio streams based on the audio activity level in RTP packets. Alternatively, the MSCS 4 transmits only audio packets when the energy level is above a predefined threshold.

Figure 2 shows two possible layouts of a client display. Figure 2B displays an equal view layout, where video streams from two clients are shown. The equal view layout is typically used in cases where three clients participate in a call, or in a peer-to-peer call display one distant client and a self view, or alternatively to display two media streams from one client, one being video and the other a presentation. Figure 2A displays an active speaker layout, the active speaker layout displaying one large video stream and three smaller video streams. Active speaker layouts and methodologies are well known to the person skilled in art.

According to a preferred embodiment of the present invention a client receiving multiple subscribed streams mixes the subscribed video streams locally. In particular, during a conference, a user of a client might want to change layouts in its video client. The local mixing capabilities in the client make that easy. The client can subscribe to receiving a new media stream if the new layout suggests that other media streams might be needed, and/or unsubscribe media streams that are not needed anymore. Similarly, the client might change layout automatically in case another client leaves or enters the conference, or as described below when an active speaker changes.

Now referring to Figure 2A, video streams from four different participants are displayed in the active speaker layout, where the active speaker, or current speaker window, is larger than the three other participants windows. The three smaller windows occupy a smaller area, thus the video streams displayed in these windows can be of a lower resolution than the video displayed in the large window and still have the same visual quality. The client therefore does not need to receive a video stream of the highest possible quality for these windows and decides to subscribe to low quality video streams, while at the same time deciding to subscribe to a high quality stream for the large window. Then, in case a user in one of the smaller windows become the active speaker, the client will decide to display the video stream of that user in the large display and the video stream of the previous speaker in a small display. The client then transmits an unsubscribe message and a subscribe message to the MSCS 4, unsubscribing to high resolution video stream of the previous speaker and subscribing to the low resolution video stream of the previous speaker. The client also transmits an unsubscribe message unsubscribing to the low resolution video stream of the new active speaker and a subscribe message subscribing to the high resolution video stream of the new active speaker. The MSCS 4 then relays the subscribe and unsubscribe messages to the relevant transmitting clients, requesting transmitting endpoints to stop transmitting the now unsubscribed video streams and start transmitting the now subscribed media streams to the MSCS 4.

In another exemplary embodiment, when using a video conference application in a PC, the decision on changing subscription of video streams may be made based on the current screen size, e.g. full screen or small screen. In yet another exemplary embodiment, the decision on which media streams to subscribe to may be made based on bandwidth restrictions. In this case, the client cannot subscribe to an amount of media data larger than the client can decode, and the client must split the available bit rate between the different media streams to obtain the best overall visual quality for a client user.

In an aspect, a method for exchanging audio, video, and data information between at least two clients in a communication network, supported by a central unit, comprises the steps of connecting the at least two clients to the central unit using a call control protocol, the call control protocol negotiating video formats and connection information for sending and receiving media streams.

Such a method further includes the step of transmitting information from a first client to the central unit, or correspondingly, receiving the information at the central unit from the first client, the information comprising meta-data describing different media streams that the first client is capable of transmitting. Further, the

information received by the central unit from the first client may be transmitted by the central unit to the at least one other client. On receiving the information on the different media streams that the first client is capable of transmitting, the at least one other client may decide on which of the available media streams from the first client the at least one other client will subscribe to. Then, a subscribe message may be transmitted from the at least one other client to the central unit, or

correspondingly, the subscribe message may be received from the at least one other client at the central unit, the subscribe message subscribing to at least one available media stream from the first client. On receiving at least one subscribe message from the at least one other client, the central unit transmits a message requesting the first client to start transmitting media streams subscribed to by the at least one other client. Further, the media streams subscribed to by the at least one other client is transmitted from the first client to the central unit, or correspondingly, they are received by the central unit from the first client, and the media streams subscribed to by the at least one other client is transmitted from the central unit to the at least one other client.

In an aspect, the meta-data describing the different media streams that the first client is capable of transmitting may comprise at least one of bandwidth threshold and video resolution. In another aspect, the different media streams that the first client is capable of transmitting may comprise data provided by at least one of a presenter camera, an audience camera and a document camera.

In yet another aspect, the step of deciding on which of the available media streams from the first client the at least one other client will subscribe to is based on at least one of processing power of the client, bandwidth restrictions, and layout of a client display.

The method may further comprise the central unit automatically requesting connected clients to transmit at least one audio stream and the central unit automatically transmitting all received audio streams to all connected clients.

In certain aspects, the at least one other client may, upon a change in layout of an client display or a bandwidth restriction, decide to subscribe to at least one different available media stream from the first client. In such cases the method may further comprise steps of transmitting a unsubscribe message from the at least one other client to the central unit, unsubscribing to at least one of the media streams subscribed to by the at least one other client, and transmitting a subscribe message from the at least one other client to the central unit, subscribing to at least one other available media stream from the first client, and on receiving the subscribe and unsubscribe messages from the at least one other client, the central unit may transmit a message requesting the first client to start transmitting media streams subscribed to by the at least one other client and to stop transmitting media streams unsubscribed by the at least one other client.

The method may further comprise the at least one other client mixing the received subscribed media streams locally. The call control protocol may, e.g., be one of SIP, H.323 and Jingle.

A system comprising at least two clients in a communication network supported by a central unit may be adapted to perform the disclosed method.

The skilled person will realize that numerous alternatives and variations are possible in light of the above teaching and common general knowledge. Hence, the scope of the invention is set forth by the appended claims and their equivalents.