Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
REAL-TIME AUDIO MIXING OF DIFFERENTLY CODED DATA STREAMS IMPLEMENTED BY A DATA PROCESSOR TECHNIQUES
Document Type and Number:
WIPO Patent Application WO/2007/045902
Kind Code:
A2
Abstract:
A method of mixing a plurality of input data streams together in real time is described. The method comprises: fetching a sample from one of the plurality of input data streams; determining, for a first sample of each different input data stream, a conversion code for placing the first sample into a common format; converting a current sample into the common format using the conversion code specific to the type of input data stream of that current sample; adding the converted sample to a cumulative output sample; and repeating the fetching, converting and adding steps for a plurality of samples of the plurality of input data streams to form a plurality of cumulative output samples representing an output stream of mixed data samples; wherein the repeating step comprises using the determined conversion code for converting each of the samples in the corresponding input data stream subsequent to the first sample.

Inventors:
FOOT ERIC (GB)
Application Number:
PCT/GB2006/003921
Publication Date:
April 26, 2007
Filing Date:
October 20, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
AMTEUS SECURE COMM LTD (GB)
FOOT ERIC (GB)
International Classes:
H04M3/56; H04L12/18; H04N7/15
Foreign References:
US6792092B12004-09-14
US20020126626A12002-09-12
Other References:
PINGTEL CORP.; SIPFOUNDRY INC.: "MpConnection.cpp"[Online] 2004, XP002419733 Retrieved from the Internet: URL:http://www.koders.com/> [retrieved on 2007-02-09]
PINGTEL CORP.; SIPFOUNDRY INC.: "MprDecode.cpp"[Online] 2004, XP002419734 Retrieved from the Internet: URL:http://www.koders.com/> [retrieved on 2007-02-09]
PINGTEL CORP.; SIPFOUNDRY INC.: "MprBridge.cpp"[Online] 2004, XP002419735 Retrieved from the Internet: URL:http://www.koders.com/> [retrieved on 2007-02-09]
IPSE: "sipXmedialLib"[Online] 30 January 2007 (2007-01-30), XP002419736 sipX Wiki Retrieved from the Internet: URL:http://sipx-wiki.calivia.com/index.php /SipXmediaLib> [retrieved on 2007-02-09]
HAWWA S: "Audio mixing for centralized conferences in a sip environment" MULTIMEDIA AND EXPO, 2002. ICME '02. PROCEEDINGS. 2002 IEEE INTERNATIONAL CONFERENCE ON LAUSANNE, SWITZERLAND 26-29 AUG. 2002, PISCATAWAY, NJ, USA,IEEE, US, vol. 2, 26 August 2002 (2002-08-26), pages 269-272, XP010604749 ISBN: 0-7803-7304-9
Attorney, Agent or Firm:
AHMAD, Sheikh, Shakeel et al. (Fleet Place House 2 Fleet Place, London EC4M 7ET, GB)
Download PDF:
Claims:

Claims:

1. A method of mixing a plurality of input data streams together in real time, the method comprising: Fetching a sample from one of the plurality of input data streams;

Determining, for a first sample of each different input data stream, a conversion code for placing the first sample into a common format;

Converting a current sample into the common format using the conversion code specific to the type of input data stream of that current sample;

Adding the converted sample to a cumulative output sample; and Repeating the fetching, converting and adding steps for a plurality of samples of the plurality of input data streams to form a plurality of cumulative output samples representing an output stream of mixed data samples; Wherein the repeating step comprises using the determined conversion code for converting each of the samples in the corresponding input data stream subsequent to the first sample.

2. A method according to Claim 1 , further comprising dynamically storing consecutive samples of each input data stream in a local data store.

3. A method as claimed in Claim 2, wherein the fetching step comprising fetching the sample from one of the stored input data streams in the local input data store.

4. A method as claimed in any preceding claim, further comprising storing each conversion code locally.

5. A method as claimed in any of Claims 2 to 4, further comprising updating a pointer to the next sample in each one of the plurality of input data streams.

6. A method as claimed in any preceding claim, further comprising scaling each of the converted samples to a desired output proportion prior to the adding step.

7. A method as claimed in any of Claims 1 to 5, further comprising scaling each of the cumulative output samples to a desired output proportion.

8. A method as claimed in any preceding claim, further comprising translating each of the cumulative samples into a desired output format.

9. A method as claimed in any preceding claim, further comprising reading out the cumulative samples as a mixed output data stream.

10. A method as claimed in any of Claims 1 to 6, further comprising calculating an inverse conversion code for each generated conversion code, for use in creating a plurality of different format mixed output data streams.

11. A method as claimed in Claim 10, further comprising applying each inverse conversion code to the cumulative output sample to generate a respective sample of a corresponding mixed output data stream.

12. A method according to Claim 10 or 11 , wherein each mixed output data stream is compatible with the format of the input data stream to which the inverse conversion code relates.

13. A method as claimed in any preceding claim, further comprising storing each of the cumulative samples for subsequent output as a data output stream.

14. A method as claimed in Claim 13, further comprising controlling a time delay between generating the cumulative sample for output and outputting the same as an output stream.

15. A method as claimed in any preceding claim, further comprising building an executionable machine code segment for implementing at least some of the steps of the method.

16. A method as claimed in Claim 15, wherein the building step comprises building an executionable machine code segment, which requires no time- consuming decision making in its implementation.

17. A method of mixing a plurality of data streams together in real time, the method comprising:

Fetching a sample from a first one of the plurality of data streams; Converting the sample into a common format using a conversion code specific to the type of data stream of that sample;

Adding the converted sample to a cumulative output sample; and Repeating the fetching, converting and adding steps to repeatedly fetch, convert and add a plurality of samples together to form a plurality of cumulative output samples representing the mixed individual samples;

Wherein the method further comprises determining the conversion code for the first sample of each stream; and using the conversion code to convert each of the other plurality of samples in the same stream subsequent to the first sample.

18. A method according to any preceding claim, wherein the method comprises a real-time VoIP mixing method.

19. A method according to any preceding claim, wherein the method comprises mixing an audio data stream with a video data stream in real time.

20. A method according to any preceding claim, wherein the method comprises mixing streams of packet data received over the Internet.

21. A data sample mixer for mixing a plurality of input data streams together in real time, the mixer comprising:

Fetching means for fetching a sample from one of the plurality of input data streams;

Determining means for determining for a first sample of each different input data stream, a conversion code for placing the first sample into a common format;

Converting means for converting a current sample into the common format using the conversion code specific to the type of input data stream of that current sample;

Summation means for summing the converted sample to a cumulative output sample; and

Control means for controlling the operation of the fetching, converting and adding means to repeatedly fetch, convert and sum procedures for a plurality of samples of the plurality of input data streams to form a plurality of cumulative output samples representing an output stream of mixed data samples;

Wherein the conversion means is arranged to repeatedly use the determined conversion code for converting each of the samples in the corresponding input data stream subsequent to the first sample.

22. A mixer according to Claim 21 , further comprising a local data store for dynamically storing consecutive samples of each input data stream.

23. A mixer as claimed in Claim 22, wherein the fetching means is arranged to fetch the sample from one of the stored input data streams in the local input data store.

24. A mixer as claimed in any of Claims 21 to 23, further comprising a local data store arranged to store each conversion code.

25. A mixer as claimed in any of Claims 22 to 24, further comprising means for updating a pointer to the next sample in each one of the stored plurality of input data streams.

26. A mixer as claimed in any of Claims 21 to 25, further comprising scaling means for scaling each of the converted samples to a desired output proportion prior to the operation of the summation means.

27. A mixer as claimed in any of Claims 21 to 25, further comprising scaling means for scaling each of the cumulative output samples to a desired output proportion.

28. A mixer as claimed in any of Claims 21 to 27, further comprising translating means for translating each of the cumulative samples into a desired output format.

29. A mixer as claimed in any of Claims 21 to 28, further comprising output means for reading out the cumulative samples as a mixed output data stream.

30. A mixer as claimed in any of Claims 21 to 26, further comprising means for calculating an inverse conversion code for each generated conversion code, for use in creating a plurality of different format mixed output data streams.

31. A mixer as claimed in Claim 30, further comprising applying means for applying each inverse conversion code to the cumulative output sample to generate a respective sample of a corresponding mixed output data stream.

32. A mixer according to Claim 30 or 31 , wherein the applying means is arranged to generate mixed output data streams which are compatible with the format of the input data stream to which the inverse conversion code relates.

33. A mixer as claimed in any of Claims 21 to 32, further comprising data storing means for storing each of the cumulative samples for subsequent output as a data output stream.

34. A mixer as claimed in Claim 33, further comprising timing means for controlling a time delay between generating the cumulative sample for output and outputting the same as an output stream.

35. A mixer as claimed in any of Claims 21 to 34, further comprising building means for building an executionable machine code segment for implementing at least some of the procedures of the mixer.

36. A mixer as claimed in Claim 35, wherein the building means is arranged to build an executionable machine code segment, which requires no time-consuming decision making in its implementation.

37. A data sample mixer for mixing a plurality of data streams together in real time, the mixer comprising: Fetching means arranged to fetch a sample from a first one of the plurality of data streams;

Converting means arranged to convert the sample into a common format using a conversion code specific to the type of data stream of that sample; Adding means for adding the converted sample to a cumulative output sample;

Control means for controlling the operation of the fetching, converting and adding means to repeatedly fetch, convert and add procedures for a plurality of samples to form a plurality of cumulative output samples representing the mixed individual samples; and

Conversion code determining means for determining the conversion code for the first sample of each stream and using the same with the converting means to convert each of the other plurality of samples in the same stream subsequent to the first sample.

Description:

Improvements Relating to Real-Time Data Mixing Techniques

Field of the Invention

The present invention concerns improvements relating to real-time data mixing techniques and more particularly, though not exclusively, to a method and apparatus for implementing improved audio data mixing techniques used in mixing together multiple audio data samples, possibly encoded by different algorithms, to provide increased levels of efficiency. A significant area of application of such a new method is in the field of communications using Voice over Internet Protocol (VoIP).

Background of the Invention

The Internet provides a well-understood mechanism for data communications between a server storing that information and a client seeking that information. It is also used to support end-to-end communications such as e- mail between two different clients via a server. More recently, so called 'peer- to-peer' communications between different clients have been available and in particular VoIP communications which enable people to talk to each other by converting their voice signals into data and then utilising a data channel between clients provided by the Internet.

The clear advantage of using the Internet to support the peer-to-peer voice communication is its low cost due to use of a distributed packet-transmission communication environment as compared to a relatively high-cost conventional telephone call that is conveyed over dedicated voice telecommunications lines. In fact, users typically pay a fixed fee for broadband data communications to facilitate Internet access to their computer. In these situations for peer-to-peer communications, calls can be made at no extra cost to the user, which is very appealing. This fact alone has fuelled the rapid growth of VoIP communications in recent years.

If a person wishes to have a conference call using VoIP communications, the data packets representing the sound samples from different sources (peoples'

computers) need to be mixed together in real time, in order to enable simultaneous speech from different clients to be conveyed to all people participating in the call. It is to be appreciated that conference calls occur in real-time and as such this places a severe requirement on any mixer to operate in real-time for all parties.

In such a situation, there are several vectors of sound samples to be mixed together. An audio mixer, which is used to carry out this procedure, needs first to convert the samples into a common form because different sound samples can be arriving over different pathways with different capabilities, which can mean they have different formats. This is common when one client is speaking over a broadband connection and the other over a conventional dial- up connection. The converted individual samples then need to be added together from each of the input sets, the results need to be scaled and converted to the required output format, and finally the results need to be stored.

Pictorially, this prior art situation is represented in Figure 1 , where three vector data streams V-i, V 2 and V 3 are shown each comprising a discreet set of samples Si a , Su,, Si c S 23 , S 2 t > , S 2c , ...etc. These are summed to form a results stream Rvi 23 comprising a plurality of results samples RVI 2 3A > Rvi23B Rvi2 3 c • The results stream can be sent to all participants in the conference call and all participants can be heard simultaneously.

Figures 2 and 3 show alternative prior art methods of performing the mixing process of Figure 1 in greater detail.

In both methods shown, the first samples from each of the unmixed input data sets (streams) need to be added together to form the first mixed sample in the result. A similar operation is performed on the second and subsequent samples to form the second and subsequent mixed samples in the result.

However, each sample needs to be converted into a common form before it can be added to another sample. This leads to two broad "intuitive" prior art algorithms represented respectively in Figures 2 and 3:

Referring to Figure 2, where the first method 30 is shown, generally speaking this method 10 involves visiting each sample in each input set in turn, converting it to the common format "on the fly", and performing the addition. In Figure 3, the second method 60 is shown which generally involves converting each sample set (each of the samples in the whole data stream) into the common format before commencing the addition.

More specifically referring to Figure 2, the first method commences with initialisation at Step 32 in which where the parameter 'n' is set to 1 and the Results Store at location 'n' is set to zero. A first part of the two-part method 30 commences with fetching at Step 34 the next sample from the next data stream. The method determines at Step 36 what conversion is required from the sample's format. Next, a format conversion code is created at Step 38. The format conversion code is specific to the current sample's data format. The format conversion code is then used to convert at Step 40 the current sample into the common format. Following this the converted sample is added to the results store at position 'n'.

The method 30 marks the end of the first part by checking whether the current sample is a last sample across a set of audio data input streams, namely is the sample from stream V 3 ? If it is not, then the first part of the method 30 described above is repeated for the next sample from the next stream. The first part of ~ the method is thus repeated for different samples across the different audio data streams until it is determined at Step 44 that the last sample across all the data streams has been added to the results store. At this point, the second part of the method commences with the results for Sample 'n' being scaled at Step 46 then translated at Step 48 into the desired output format and finally, stored at Step 50 for subsequent output as results output sample 'n' of the results stream Rvi 23 - Parameter 'n' is subsequently incremented at Step 52 and a check is carried out at Step 54 to determine if

the current sample is the last sample. If it is not, the whole of the above described process, namely the first and second parts is repeated. Otherwise, the method 30 ends at Step 56 and the results output samples can be streamed to all parties to the conference call as a mixed output audio data stream.

The alternative prior art method 60 is shown in detail in Figure 3. This method 60 commences with an initialisation phase at Step 62 in which the parameter 'n' is set to 0, parameter S is set to 1 , a cumulative array and a results array are both set to zero and the Results Store at location 'n' is set to zero. A repetitive core part of the method 60 commences with increment of the parameter 'n' at Step 64, followed by fetching at Step 66 the next sample from the data stream 'S'. The method 60 then determines at Step 68 what conversion is required, from the sample's format, and a format conversion code is created at Step 70. The format conversion code is specific to the current sample's data format. The format conversion code is then used to translate at Step 72 the current sample into a common format. The converted sample is then stored at Step 74 to the results array at location 'n'.

This marks the end of the repetitive core part of the method by checking at Step 76 whether the current sample is a last sample across a set of audio data input streams, namely is the sample from stream V 3 ? If it is not, then the repetitive core part of the method 60 described above, is repeated for the next sample from stream S, namely the same stream. Thus if the first sample was SVIA the next sample will be SVIB (using the example set out in Figure 1). The repetitive core part of the method 60 is thus repeated for each of the samples in one stream until it is determined at Step 76 that the last sample in the present data stream has been added to the results array. Then the contents of the results array are added at Step 78 to the cumulative array.

The method 60 then checks at Step 80 whether the current audio data stream is a last audio data stream across the set of audio data input streams, namely is the current stream V 3 (using the example set out in Figurei)? If it is not,

then the parameter 'S' is incremented at Step 82 and the parameter 'n' and the results array, are set at Step 82 to zero. Then the repetitive core part of the method 60, described above, is repeated for the samples of the next stream S. Thus if the stream was Vi the next stream will be V 2 (using the example set out in Figure 1 ). The repetitive core part of the method 60 is thus repeated for each of the samples in this next stream and the contents of the results array added to the cumulative array until it is determined at Step 80 that the samples for the last stream have been added to the cumulative array.

When this point has been reached, the method concludes by scaling at Step 84 each element of the cumulative array. Then each element of the cumulative array is translated at Step 86 into the desired output format. Thereafter each stored translated element is stored at Step 88 in the output array for subsequent output, the output array samples being streamed to all parties to the conference call as a mixed output audio data stream.

The present inventor has realised that both methods have a significant disadvantage in that they have a high performance overhead, namely they are inefficient. This is because of several reasons, which are discussed below. Firstly, temporary storage areas need to be allocated and freed to hold the converted input sets before their individual samples are added to individual samples in other converted input sets. The movement of data into and out of these temporary storage sets creates time delay for real-time processing and disadvantageously requires additional memory.

Secondly, the processing required for the addition of each input sample is a five-stage process of the following steps FETCH raw sample, CONVERT it into the common format, STORE the converted sample, FETCH the stored sample, and ADD it to a results store. This number of operations takes up significant processor time (many clock cycles of the Central Processing Unit), which only serves to make the process less efficient. In this regard, it is to be appreciated that any small saving in the number of steps to be executed for each sample (namely a reducing the number of CPU execution clock cycles

for processing each sample), are scaled into significant time savings in the real-time mixing operation.

Thirdly, the processor needs to decide what conversion is required for each sample in each input set in real-time. This is considered to be essential in implementation of the 'on-the-fly' conversion of the first prior art method. Processor decision-making is known to be computationally very expensive. This multiple decision-making is considered to be particular computationally expensive and contributes significantly to the inefficiency of the process.

Summary of the Present Invention

It is desired to overcome at least some of the problems outlined above and to improve the efficiency of existing digital audio data mixing techniques.

The present invention resides in the appreciation that the problems and limitations of the existing prior art methods can be overcome by effectively pre-processing the data to be mixed using knowledge of the characteristics of the data prior to its combination (mixing). More specifically, by storing the conversion code of a first sample in a stream of samples, use of that code repeatedly for other elements in the same stream enables a marked improvement in efficiency to be achieved.

Another way of looking at the present invention's contribution is that it enables the prior art problems to be overcome by generating an efficient machine code segment containing instructions for the mixing operation. A single analysis phase stores the machine instruction necessary for loading and converting the input samples (in whatever type each is actually presented) performing the addition and scaling operations, converting the result to whatever type or types are actually required, and then storing these results. Once generated, that code segment may be used for any number of samples, until either an input or output format changes, resulting in marked improvement in efficiency.

In other words, the overheads of the two prior art "intuitive" approaches described above can be avoided using the above-described pre-compilation technique. Thereafter, a single decision-making phase generates code to perform the mix operation for all subsequent samples.

According to one aspect of the present invention there is provided a method of mixing a plurality of input data streams together in real time, the method comprising: fetching a sample from one of the plurality of input data streams; determining, for a first sample of each different input data stream, a conversion code for placing the first sample into a common format; converting a current sample into the common format using the conversion code specific to the type of input data stream of that current sample; adding the converted sample to a cumulative output sample; and repeating the fetching, converting and adding steps for a plurality of samples of the plurality of input data streams to form a plurality of cumulative output samples representing an output stream of mixed data samples; wherein the repeating step comprises using the determined conversion code for converting each of the samples in the corresponding input data stream subsequent to the first sample.

The present invention provides significant improvements in speed of operation as are described in detail elsewhere in this application. The increase in speed is due to a reduction in the computationally expensive procedures which have been prevalent in the prior art methods. The present invention provides a simple but elegant solution to the issue of real-time data mixing and real-time mixed data output.

The method may further comprise dynamically storing consecutive samples of each input data stream in a local data store. In this case the fetching step may advantageously comprise fetching the sample from one of the stored input data streams in the local input data store.

The method may further comprise storing each conversion code locally. This also speeds up processing as look up of a locally stored conversion code, for

example in cache memory or even local processor memory, is a fast way of accessing the required conversion code without delay.

Preferably the method further comprises updating a pointer to the next sample in each one of the plurality of input data streams. Again the use of pointers to index the current sample in a dynamically changing method is a convenient way of implementing the fast look up to current sample position in the input data stream.

The method may also comprise scaling each of the converted samples to a desired output proportion prior to the adding step. This scaling step allows for an equalisation of the size of the data sample to be carried out before addition which is advantageous when going to an executable file implementation as has been described elsewhere.

Preferably the method further comprises translating each of the cumulative samples into a desired output format. This enables, in an audio data example, a common output to be supplied to all parties subject to a conference call. Also the method may further comprise reading out the cumulative samples as a mixed output data stream.

As an alternative, the method may further comprise calculating an inverse conversion code for each generated conversion code, for use in creating a plurality of different format mixed output data streams. The advantages of this have been described in the detailed description.

The method may in this case further comprise applying each inverse conversion code to the cumulative output sample to generate a respective sample of a corresponding mixed output data stream. Also each mixed output data stream is compatible with the format of the input data stream to which the inverse conversion code relates.

The method may further comprise storing each of the cumulative samples for subsequent output as a data output stream. In this case, the method may

further comprise controlling a time delay between generating the cumulative sample for output and outputting the same as an output stream. This advantageously enables the mixer processor to control the timing of the output procedure.

Preferably the method further comprises building an executionable machine code segment for implementing at least some of the steps of the method. Preferably this machine code segment requires no time-consuming decision making in its implementation. The advantages of faster implementation have been described elsewhere.

According to another aspect of the present invention there is provided a data sample mixer for mixing a plurality of input data streams together in real time, the mixer comprising: fetching means for fetching a sample from one of the plurality of input data streams; determining means for determining for a first sample of each different input data stream, a conversion code for placing the first sample into a common format; converting means for converting a current sample into the common format using the conversion code specific to the type of input data stream of that current sample; summation means for summing the converted sample to a cumulative output sample; and control means for controlling the operation of the fetching, converting and adding means to repeatedly fetch, convert and sum procedures for a plurality of samples of the plurality of input data streams to form a plurality of cumulative output samples representing an output stream of mixed data samples; wherein the conversion means is arranged to repeatedly use the determined conversion code for converting each of the samples in the corresponding input data stream subsequent to the first sample.

According to another aspect of the present invention there is provided a data sample mixer for mixing a plurality of data streams together in real time, the mixer comprising: fetching means arranged to fetch a sample from a first one of the plurality of data streams; converting means arranged to convert the sample into a common format using a conversion code specific to the type of data stream of that sample; adding means for adding the converted sample to

a cumulative output sample; control means for controlling the operation of the fetching, converting and adding means to repeatedly fetch, convert and add procedures for a plurality of samples to form a plurality of cumulative output samples representing the mixed individual samples; and conversion code determining means for determining the conversion code for the first sample of each stream and using the same with the converting means to convert each of the other plurality of samples in the same stream subsequent to the first sample.

Brief Description of the Drawings

Methods and apparatus according to a presently preferred embodiment of the present invention for generating computer graphics will now be described by way of example, with reference to the accompanying drawings in which:

Figure 1 is a schematic block diagram showing the processing operation of the existing prior art audio data mixing techniques;

Figure 2 is a flow diagram showing the detailed processing steps involved in the operation of an existing on-the-fly conversion and mixing method;

Figure 3 is a flow diagram showing the processing steps involved in the operation of an existing block conversion and mixing method;

Figure 4 is a schematic block diagram showing an audio data mixer according to an embodiment of the present invention;

Figure 5 is a flow diagram showing an audio data mixing technique implemented on the audio data mixer of Figure 4 according to a first embodiment of the present invention; and

Figure 6 is a flow diagram showing an audio data mixing technique implemented on the audio data mixer of Figure 4 according to a third embodiment of the present invention;

Detailed Description of the Presently Preferred Embodiment The embodiment of the present invention can be realised as a dedicated piece of hardware or as an electronic programmable device configured by machine code segments to implement a new audio mixing technique. Using either implementation, the benefits of the present invention are still realised. It is to be appreciated that whilst the present invention could be implemented purely in software for use on a general-purpose computer, this is not presently preferred, as it would slow down the overall performance of the method as compared to a hardware solution. For ease of understanding, a dedicated hardware implementation is described herein.

Referring now to Figure 4, a VoIP audio data mixer 100 according to the embodiment of the present invention is shown. The audio data mixer 100 comprises an input module 102 for receiving a plurality of audio data packets, which together make up a plurality of different audio data streams 104. In the present embodiment, these audio data streams 104 are being generated in real-time and represent an audio conference call going on between a plurality of different parties. However, in alternative uses, the data streams 104 can be of any type such as music, audio and even video data streams which are being mixed in real time, for example.

The input module 102 is arranged to be continually storing the received data packets in a local data store 106. The data store 106 links together audio data packets from the same source to form plurality of linked audio data samples 108 that make up a serial file 110 of audio data 104. Each serial file has a pointer (P) 112 which points to the latest sample 108 of audio data and the pointer 112 is used in the efficient mixing of the audio data samples 108 together as is described later.

It is to be appreciated that as the audio data mixer 100 is designed to operate efficiently, the data store 106 is not designed to store the complete data stream 104. Rather it acts as a buffer, storing what has been received and overwriting that part of the serial file 110, which represents the audio data stream 104 that has been processed. In this way, the data store 106 is kept to

a minimal size with consequential benefits in cost and time saving. The data store 106 can store simultaneously audio data samples 108 making up a plurality of different serial files 110 representing a plurality of audio data streams.

The VoIP audio data mixer 100 also comprises a data processor 114, which is arranged to implement the audio data mixing technique of the present embodiment. The data processor 114 is connected to the data store 106 and processes the received audio data samples 108 in real time. It is also responsible for moving the pointers 112 to point to the next sample 108 of a serial file 110 to be processed.

The data processor 114 is responsible for retrieving a sample 108, translating it into a desired format and adding it to an output file 116 provided in a local temporal data store 118. The audio processor 114 is also configured to control how the mixed sample is then output. Translation from a received format to a common processing format is determined by format conversion codes 120, which are also provided locally (stored in the local temporal data store 118) to the data processor 114. These format conversion codes 120 are either created or specified on the first sample 108 of a given data stream 104 that is to be converted. Accordingly, there is one conversion code 120 determined for each data input stream 1O4.Thereafter, the same conversion code 120 is used for translating all samples 108 from the same audio data stream 104 thereby providing huge efficiency improvements over the known methods.

Once all of the equivalent samples 108 of the data streams 104 have been summed, the resultant sample 122 in the output file 116 is scaled and thereafter converted to the desired output format for broadcasting via the output module 124 to the users' computing devices (not shown) participating in the conference call. This output sample forms part of a results stream 126 which is output to all participants connected on the conference call. The output of the samples is preferably immediately when real-time operation is desired. However, a slight delay, which is programmably selectable by the audio data processor, can also be accommodated by buffering the output from

the output file 116 to the output module 124 using an output data store 128. In this embodiment, a delay of four output samples 130 has been selected by the audio data processor 114, though this can be varied in real-time by the audio data processor 114 as is appropriate for any given situation.

Referring now to Figure 5, a method 140 of operating the above-described mixer 100 to perform a method of mixing multiple data streams 104 in real time is described. The method 140 comprises a first set of steps 142 which is repeated for each equivalent sample 108 across the different data streams (V- I , V2, V 3 ) 104 being added together and a second set of steps 144, which is repeated for each resultant sample 122 of the eventual results data stream 126 (Rvi23). The first set of steps 142 also comprises a key conversion determining step 152 which is only carried out once per stream 104, namely for the first sample 108 of each stream 104 and it is this step which enables the efficiency of the present method to be achieved.

More specifically, the method commences with an initialisation of parameters at Step 146 where the parameter 'n' is set to 1 and the Results Store (output file 116) at location 'n' is set to zero. The parameter 'n' in this embodiment reflects the position of the pointer 112 within a serial file 110 in the data store 106. The first stage 142 commences with fetching at Step 148 the next sample 108 from the next data stream 104. A check is then made at Step 150 as to the value of the parameter 'n'. If the parameter 'n' = 1 , namely the pointer 112 is pointing to the first sample in a serial file 110, then a two-step sub procedure 152 is carried out. This sub procedure 152 involves determining at Step 154 what conversion is required to convert the sample 108 into a common format, which enables it to be added to the other samples in the same common format. Next, a format conversion code (translation code) 120 is created at Step 156 by the data processor 114 or obtained from the local temporal data store 118 if it is available. The format conversion code 120 is specific to the current sample's data format. This format conversion code 120 is stored as the format conversion code 120 to be used for all subsequent audio data samples 108 received from the same input audio data stream 104. This process described above is termed a pre-compilation

technique. The next step of the method 140 is to convert at Step 158 the current sample 108 into the common format using the stored ' format conversion code 120 for the current sample stream 104.

If the size of parameter 'n' is greater than one at the check step 150 (mentioned earlier), namely the pointer 112 for the current stream is pointing to a sample other than the first sample in the serial file 110, then the sub procedure described above is missed out and the next step in the method 140 is the conversion step 158. Then, regardless of whether the sub procedure 152 is carried out, the translated sample 122 is added to the results store (output file 116) at location 'n'.

A check is then carried out at Step 162 to determine if the current sample 108 is the last one across the data sample streams 104, (namely taking the example set out in Figure 1 , is the sample from stream V 3 ). If it is not, then the first set of steps 142 described above is repeated for the next sample 108. The first set of steps 142 is repeated for different samples 108 across the different streams 104 until the last sample 108 across all the data streams 104 has been added at Step 160 to the results store 116. At this point, the results 122 for sample 'n' are scaled at Step 164 then translated at Step 166 into the desired output format and finally, stored at Step 168 for subsequent output as results output sample 'n' of the results stream 126 (namely R V i 23 When looking at the Figure 1 example). Parameter 'n' is subsequently incremented at Step 170 and a check is carried out at Step 172 to determine if the current sample 108 is the last sample. If it is not, the whole of the above described process, namely the first and second set of steps 142, 144, is repeated. Otherwise, the method 140 ends at Step 174 and the results output samples 126 can be streamed to all parties to the conference call as a mixed output audio data stream.

In a second embodiment, a very similar approach is taken to the above described first embodiment. Accordingly, to avoid unnecessary repetition only the differences between the embodiments are described hereinafter. From the outline description given below and taking into consideration the description

provided of the first embodiment, the skilled addressee will be able to construct the second embodiment without difficulty.

The second embodiment carries out the processing in a slightly different manner to the first embodiment. In the second embodiment, there are completely separate Analysis and Compilation phases, followed by an Execution phase for all equivalent samples across all streams. In this case, each first sample in a stream would be obtained, analysed to determine the translation machine code segment required for the samples in each different stream, and stored ready for subsequent use on all subsequent samples from the same stream. One translation machine code segment would be determined for each different audio data stream. This would complete the Analysis phase.

In the next phase (the Compilation phase), the required output for each of the equivalent samples across the different audio data input streams would be compiled by obtaining and translating each equivalent sample from each audio data stream (e.g. in the example set out in Figure 1 it could be samples SviA. Sv2A, Sv3A and SV4A)- Each of the translated samples would be added to the sample results store to form an output sample ready for execution.

In the Execution phase, each of the output samples would be scaled and translated into the desired format for output. Here also the previously mentioned programmable optional delay in the output of the mixed output sample could be employed using output data store 128. This would create the delayed output samples 130.

Subsequently for the reminder of the samples in each of the audio input data lists 110, only the compilation and execution phase need be carried out. This results in faster execution of the mixing technique of the first embodiment as less decisions are required of the audio data processor 114.

A third embodiment is now described with reference to Figure 6. This embodiment is arguably the fastest of all the embodiments as it is designed to

build a purely executionable machine code segment which requires no time- consuming decision making in its implementation. The third embodiment is executed on the mixer 100 as shown in Figure 3 but with a different algorithm as is described below. Also there is a need to use the local temporal data store 118 to store the executable (not shown in Figure 3) as it is being compiled.

More specifically referring to Figure 6, a method 200 according to a third embodiment of mixing audio data efficiently in real time is described. The method commences with the implementation at Step 202 of a set up stage. The set up stage comprises initialising all audio data stream lists 110, and all the necessary pointers 112 and data stores 120. this stage also generates machine code for implementing the initialisation. Then at Step 206 the machine code is stored in the local temporal data store, as a first part of the cumulative output machine code executable (not shown).

The method 200 continues with getting the next sample and then determining at Step 208 what conversion code is required to convert the current sample 108 into a common format, which enables it to be added to the other samples also in the same common format. Next, a format conversion code (translation code) 120 is created at Step 210 by the data processor 114 or obtained at Step 210 from the local temporary data store 118 if it is available. The format conversion code 120 is specific to the current sample's data format is to be used as the format conversion code 120 for all subsequent audio data samples 108 received from the same input audio data stream 104. This format conversion code 120 (or stream conversion code) is stored at Step 212 in the local temporal data store 118 and the machine code instruction for using the conversion code to effect the conversion of the current sample is added at Step 212 to the cumulative output machine code executable (not shown).

Next, the method 200 comprises determining at Step 214 the required scaling parameter for the current sample of the current stream to get the sample into the correct size for output. This scaling conversion parameter (not shown) is stored at Step 216 and the processor machine code instructions for using this

scaling parameter to effect the scaling on a sample and adding the scaled sample to the results store, is added at Step 216 to the cumulative output machine code executable. A 'get next stream' machine code is determined at Step 218 and is also added at Step 218 to the cumulative output machine code executable.

The method 200 then determines at Step 220 whether the current sample 108 is from the last input stream across all the data streams (in a similar way to that described in Step 162 of the first embodiment). If it is not, then Steps 208 to 218 described above are repeated for the next sample 108. These steps are repeated for different samples 108 across the different streams 104 until the last sample 108 across all the data streams 104 has been processed. At this point, a translation code (output stream code) is determined at Step 222 for translating the current cumulative results store into the desired output format. This translation code is added at Step 224 to the cumulative output machine code executable. This completes the analysis of the data input streams and the compilation of the machine code executable.

Having created the machine code executable, the processing phase commences with execution at Step 226 of the machine code executable. The code segment (machine code executable) is used in this processing phase for actually carrying out the loading, conversion, addition, scaling and storage steps for any number of samples in the same stream, until either an input or output format changes. This clearly results in a marked improvement in efficiency in the process of mixing data streams. The great advantage of running the executable to carry out the real-time audio mixing is the high speed of the processing. As there are no decision to be made by the executable, the audio data processor 114 can process the commands at an optimal speed. Each time the machine code executable is run, an output packet representing a mixed audio sample across all streams is generated for output. If there are further samples to be processed in any of the audio streams, as determined at Step 228, then the executable is run again generating a further mixed audio output sample. This continues until there are

no further samples to be processed, and then the method 200 ends at Step 230.

It is useful to perhaps consider the following example which illustrates the effective speed up that the present invention can provide with respect to the prior art methods.

Example

Suppose, for example, that each of the input sets (serial files 110 of audio data) contained 1000 samples. The second prior art method, described in the introduction, would require analysis to determine what conversion was required 1000 times for each input set. Instead of this, the efficient method of the present embodiments performs the analysis once for each input set (sample stream 104) and stores a conversion code 120 to perform the required conversion for subsequent samples 108 of the same input sample stream 110. The processing required by the audio data processor 114 to generate and save the code 120 is of the same order of magnitude as that required to perform a single analysis and conversion. Hence, something of the order of 998 analysis times are saved by the present embodiments for each input set (serial file 110) in this example.

The code generated has the form:

Fetch and Convert the next sample from Input Set 1 Update Input Set 1 Pointer

Fetch, Convert and Add the next sample from Input Set 2 Update Input Set 2 Pointer

Fetch, Convert and Add the next sample from Input Set 3

Update Input Set 3 Pointer

Fetch, Convert and Add the next sample from Input Set n Update Input Set n Pointer

Scale, Convert and Store the Result

Repeat until all 1000 samples of each Input Set have been processed.

It is to be appreciated that the FETCH, DECIDE, CONVERT, STORE, FETCH, ADD steps of the prior art intuitive approach is replaced by a FETCH CONVERT ADD process of the present embodiments and that the analysis as to which type of conversion is required, is performed only once. Clearly the number of processor machine cycles required is advantageously significantly reduced, resulting in a faster more efficient real-time mixing process.

It is to be appreciated that the same principles of the mixing technique described in the first, second and third embodiments could be used to produce multiple output formats by building a set of conversion codes in the same way as the input conversion codes are created. More specifically, in the above described embodiments it has been assumed that the different parties to the conference call can all receive the output of the audio data mixer in a desired format that it is output in. However, just as the input streams 104 can be in different formats, so can the desired output from the mixer 100. In this case, the following can be applied to any of the above three embodiments without difficulty and results in a huge benefit in terms of performance.

The method of converting the output of the above mixer 100 as described in the first to third embodiments into a plurality of different-format audio data output streams, commences with determining the inverse of each of the stored conversion codes 120 stored in the local temporal data store 118. These inverse conversion codes (not shown) are stored in the local temporal data store 118 for use in a final output stage. They simply reverse the conversion from original format to common format, namely they enable the resultant sample 122 in the common internal format of the mixer 100, to be converted back into the original sample input format. Having calculated and

stored these inverse conversion codes (not shown), they are used in place of the final translation of the results store stage described in the above embodiments (Stages 164 and 222). Thus the cumulative results sample 122, is simply converted into a plurality of mixed audio data output streams (each being in a format suited to the audio data source from where the audio was generated), in real time by the mixer 100. These output streams are then individually handled by the output module 124 for output from the mixer 100 in a similar manner as the single cumulative output of the previous embodiments.

Thus a person wishing to join a VoIP conference call over a very limited bandwidth connection, where other participants have better quality connections, can not only have his audio conversation mixed effectively into the conference call for others to listen to over their high quality connections, but also can listen to the conference call himself via his poor-quality connection. The significant advantage of using the above described multiple output formats, is that the quality of the connection for each participant in the conference call, is the quality of the mixed audio which will be presented back to the user for the conference call, all in real time. Otherwise, if this aspect of the embodiments were not used, then the users with the higher quality connections would end up having to listen to the audio data in the lowest common format for all participants. Accordingly this aspect of the embodiments enables each participant in the conference call to maximise the quality of their reception.

Furthermore, as the inverse conversion codes are stored and pre-calculated, the can be used repetitively for subsequently generated cumulative output samples without requiring calculation. This provides a significant speed up in the processing of such different output audio data streams. Also, machine code used to execute the inverse conversion codes can be determined and used as an addition to the third embodiment machine code executable to implement a very efficient multiple format input and multiple format output real-time data mixer.

Having described a particular preferred embodiment of the present invention, it is to be appreciated that the embodiment in question is exemplary only and that variations and modifications, such as will occur to those possessed of the appropriate knowledge and skills, may be made without departure from the spirit and scope of the invention as set forth in the appended claims. For example, any combination of video and/or audio mixing can be achieved by the present invention in real time such that if a video of a live event is being generated and supplied to the mixer, one or more voiceover commentary audio stream(s) can be mixed thereto efficiently. Using the Internet the commentator and the video source can be in different locations.

It is also to be appreciated that the different features of the embodiments described above can be combined in different ways without departure from the scope of the present invention to result in new embodiments. The skilled addressee will be highly competent in determining how such new combinations would work.