AN APPARATUS AND METHOD TO ASSIST THE SYNCHRONISATION OF AUDIO OR VIDEO SIGNALS FROM MULTIPLE SOURCES

Title:

AN APPARATUS AND METHOD TO ASSIST THE SYNCHRONISATION OF AUDIO OR VIDEO SIGNALS FROM MULTIPLE SOURCES

Document Type and Number:

WIPO Patent Application WO/2016/139392

Kind Code:

Abstract:

Apparatus comprising: an audio analyser configured to determine a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value;a detectable audio signal generator configured to generate a detectable audio signal when the spectral flatness value is less than the threshold value; an audio output configured to output the detectable audio signal when the spectral flatness value is less than the threshold value.

Inventors:

MATE SUJEET SHYAMSUNDAR (FI)
LEPPÄNEN JUSSI (FI)

Application Number:

PCT/FI2016/050103

Publication Date:

September 09, 2016

Filing Date:

February 18, 2016

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NOKIA TECHNOLOGIES OY (FI)

International Classes:

H04N21/2368; G11B27/10; G11B27/28; G11B27/30; H04N21/218; H04N21/242; H04N21/2665; H04N21/2743; H04N21/422; H04N21/4223; H04N21/434; H04N21/482; H04N21/8547

Foreign References:

US20070136053A1	2007-06-14
US20130121662A1	2013-05-16
US20140081987A1	2014-03-20
US20120265859A1	2012-10-18
US20140192200A1	2014-07-10

Attorney, Agent or Firm:

NOKIA TECHNOLOGIES OY et al. (IPR DepartmentKarakaari 7, Espoo, FI)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS:

1 . Apparatus comprising:

an audio analyser configured to determine a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value;

a detectable audio signal generator configured to generate a detectable audio signal when the spectral flatness value is less than the threshold value;

an audio output configured to output the detectable audio signal when the spectral flatness value is less than the threshold value.

2. The apparatus as claimed in claim 1 , wherein the detectable audio signal is an ultrasound signal.

3. The apparatus as claimed in any of claims 1 or 2, further comprising an apparatus analyser configured to determine at least one parameter associated with a further apparatus, and wherein the detectable audio signal generator is further configured to generate a detectable audio signal based on the at least one parameter.

4. The apparatus as claimed in claim 3, wherein the at least one parameter comprises a further apparatus microphone sensitivity, and the detectable audio signal generator is further configured to control the intensity of the detectable audio signal based on the further apparatus microphone sensitivity.

5. The apparatus as claimed in any of claims 3 or 4, wherein the at least one parameter comprises a further apparatus microphone frequency sensitivity, and the detectable audio signal generator is further configured to control the frequency range of the detectable audio signal based on the further apparatus microphone frequency sensitivity.

6. The apparatus as claimed in any of claims 3 to 5, wherein the at least one parameter comprises a further apparatus distance, and the detectable audio signal generator is further configured to control the intensity of the detectable audio signal based on the further apparatus distance.

7. The apparatus as claimed in any of claims 3 to 6, wherein the at least one parameter comprises a camera pose estimate, and the detectable audio signal generator is further configured to control the intensity of the detectable audio signal based on the camera pose estimate.

8. The apparatus as claimed in any of claims 1 to 7, further comprising an audio recorder configured to record and/or encode a captured audio signal associated with an audio scene.

9. The apparatus as claimed in claim 8, further comprising:

a receiver configured to receive at least one further audio signal captured by a further apparatus configured to monitor the audio scene;

an alignment generator configured to determine at least one time indicator value for each of the at least one further audio signal and the audio signal; and a synchronizer configured to synchronize at least one of the at least one further audio signal and the audio signal signal stream to another signal stream based on the at least one time indicator value. 10. The apparatus as claimed in claim 9 further comprising a filter configured to filter the detectable audio signal component from the at least one further audio signal and the audio signal.

1 1 . The apparatus as claimed in any of claims 9 and 10, wherein the alignment generator is configured to generate the at least one time indicator for the at least one further audio signal and the audio signal based on the correlation between the at least one further audio signal and the audio signal.

12. The apparatus as claimed in claim 1 1 wherein the at least one time indicator comprises the ratio of the variance and mean values of the correlation between the at least one further audio signal and the audio signal.

13. A method comprising:

determining at an apparatus a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value;

generating a detectable audio signal when the spectral flatness value is less than the threshold value; and

outputting from the apparatus the detectable audio signal when the spectral flatness value is less than the threshold value. 14. The method as claimed in claim 13, wherein the detectable audio signal is an ultrasound signal.

15. The method as claimed in any of claims 13 or 14, further comprising determining at least one parameter associated with a further apparatus, and wherein the generating a detectable audio signal further comprises generating a detectable audio signal based on the at least one parameter.

16. The method as claimed in claim 15, wherein the at least one parameter comprises a further apparatus microphone sensitivity, and generating a detectable audio signal further comprises controlling the intensity of the detectable audio signal based on the further apparatus microphone sensitivity.

17. The method as claimed in claim 15, wherein the at least one parameter comprises a further apparatus microphone frequency sensitivity, and generating a detectable audio signal further comprises controlling the intensity of the detectable audio signal based on the further apparatus microphone frequency sensitivity.

18. The method as claimed in claim 15, wherein the at least one parameter comprises a distance between the apparatus and further apparatus, and generating a detectable audio signal further comprises controlling the intensity of the detectable audio signal based on the distance.

19. The method as claimed in claim 15, wherein the at least one parameter comprises a camera pose estimate, and generating a detectable audio signal further comprises controlling the intensity of the detectable audio signal based on the camera pose estimate.

20. The method as claimed in any of claims 15 to 19, further comprising recording and/or encoding a captured audio signal associated with an audio scene. 21 . The method as claimed in any of claims 15 to 20, further comprising:

receiving at least one further audio signal captured by the further apparatus configured to monitor the audio scene; determining at least one time indicator value for each of the at least one further audio signal and the audio signal; and synchronizing at least one of the at least one further audio signal and the audio signal signal stream to another signal stream based on the at least one time indicator value.

22. The method as claimed in claim 21 , further comprising filtering the detectable audio signal component from the at least one further audio signal and the audio signal.

23. The method as claimed in any of claims 21 or 22, wherein determining at least one time indicator value comprises generating the at least one time indicator for the at least one further audio signal and the audio signal based on the correlation between the at least one further audio signal and the audio signal.

24. The method as claimed in claim 23, wherein the at least one time indicator may comprise the ratio of the variance and mean values of the correlation between the at least one further audio signal and the audio signal. 25. An apparatus comprising at least one processor and at least one memory including computer program code for one or more programs, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to:

determine a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value;

generate a detectable audio signal when the spectral flatness value is less than the threshold value; and

output the detectable audio signal when the spectral flatness value is less than the threshold value.

26. The apparatus as claimed in claim 25, further caused to determine at least one parameter associated with a further apparatus, and wherein the generating a detectable audio signal further causes the apparatus to generate a detectable audio signal based on the at least one parameter.

27. An apparatus comprising:

means for determining a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value;

means for generating a detectable audio signal when the spectral flatness value is less than the threshold value; and

means for outputting the detectable audio signal when the spectral flatness value is less than the threshold value.

28. The apparatus as claimed in claim 27, further comprising means for determining at least one parameter associated with a further apparatus, and wherein the means for generating a detectable audio signal further comprise means for generating a detectable audio signal based on the at least one parameter.

Description:

AN APPARATUS AND METHOD TO ASSIST THE SYNCHRONISATION OF AUDIO OR VIDEO SIGNALS FROM MULTIPLE SOURCES

The present invention relates to apparatus to assist the synchronisation of audio or video signal processing from multiple sources. The invention further relates to, but is not limited to, apparatus in mobile devices to assist the synchronisation of audio or video signal processing from multiple sources.

Viewing recorded or streamed audio-video or audio content is well known. Commercial broadcasters covering an event often have more than one capture apparatus (video-camera/microphone) and a programme director will select a 'mix' where an output from one or more capture apparatus is selected for transmission. Such systems are problematic and lack flexibility in that to reduce transmission resources even 'interactive' services only offer a very limited selection of possible feeds or recording positions.

User generated content recorded and uploaded (or up-streamed) to a server rely on members of the public recording and uploading (or up-streaming) a recording of an event using the recording facilities at hand. This may typically be in the form of the camera and microphone arrangement of a mobile device such as a mobile phone. The mobile device or mobile phone may be considered to be an example of a capture apparatus or device. Often an event may be attended and recorded from more than one position by different recording users.

The director or end user may then select one of the up-streamed or uploaded data to view or listen. Furthermore where there are multiple recordings for the same event it may be possible to improve the quality of a single recording. However directing and editing user generated content can be difficult as the user generated content recordings are typically made in an unsynchronised manner. In other words each user may be recording using different sample frequencies, and/or encoding the recording a different bit rates, and/or even using different encoding formats. Furthermore even in 'real-time' streaming situations different users may be up-streaming over different parts of the network, or using different network parameters with a differing latency resulting.

The effect of which is that there is typically a time delay associated with the process of each recording that is not constant for each recording user.

It has been proposed, for example in PCT published application WO 2010/131 105 that the captured audio signals from multiple recordings are analysed to determine a time delay such that the content data may be synchronised. This synchronised content data may then be processed and downloaded by an end user. This analysis and synchronisation can be achieved by a server apparatus separate from the capture apparatus or may be implemented within a capture apparatus (configured to operate as a server). However this type of audio signal analysis and processing is very sensitive to audio signal quality and can produce poor quality synchronisation. For example where the audio signal levels are too low or the background noise is too high the analysis may fail to determine a sufficient number of detectable audio transients or other characteristics which can be used to identify the audio signal delay between the recordings.

There is provided according to the invention an apparatus comprising: an audio analyser configured to determine a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value; a detectable audio signal generator configured to generate a detectable audio signal when the spectral flatness value is less than the threshold value; and an audio output configured to output the detectable audio signal when the spectral flatness value is less than the threshold value.

The detectable audio signal may be an ultrasound signal. The apparatus may further comprise an apparatus analyser configured to determine at least one parameter associated with a further apparatus, and wherein the detectable audio signal generator may be further configured to generate a detectable audio signal based on the at least one parameter.

The at least one parameter may comprise a further apparatus microphone sensitivity, and the detectable audio signal generator may be further configured to control the intensity of the detectable audio signal based on the further apparatus microphone sensitivity.

The at least one parameter may comprise a further apparatus microphone frequency sensitivity, and the detectable audio signal generator may be further configured to control the frequency range of the detectable audio signal based on the further apparatus microphone frequency sensitivity.

The at least one parameter may comprise a further apparatus distance, and the detectable audio signal generator may be further configured to control the intensity of the detectable audio signal based on the further apparatus distance. The at least one parameter may comprise a camera pose estimate, and the detectable audio signal generator may be further configured to control the intensity of the detectable audio signal based on the camera pose estimate.

The apparatus may further comprise an audio recorder configured to record and/or encode a captured audio signal associated with an audio scene.

The apparatus may further comprise: a receiver configured to receive at least one further audio signal captured by a further apparatus configured to monitor the audio scene; an alignment generator configured to determine at least one time indicator value for each of the at least one further audio signal and the audio signal; and a synchronizer configured to synchronize at least one of the at least one further audio signal and the audio signal signal stream to another signal stream based on the at least one time indicator value.

The apparatus may further comprise a filter configured to filter the detectable audio signal component from the at least one further audio signal and the audio signal.

The alignment generator may be configured to generate the at least one time indicator for the at least one further audio signal and the audio signal based on the correlation between the at least one further audio signal and the audio signal.

The at least one time indicator may comprise the ratio of the variance and mean values of the correlation between the at least one further audio signal and the audio signal.

According to a second aspect there is provided a method comprising: determining at an apparatus a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value; generating a detectable audio signal when the spectral flatness value is less than the threshold value; and outputting from the apparatus the detectable audio signal when the spectral flatness value is less than the threshold value.

The detectable audio signal may be an ultrasound signal.

The method may further comprise determining at least one parameter associated with a further apparatus, and wherein the generating a detectable audio signal may further comprise generating a detectable audio signal based on the at least one parameter.

The at least one parameter may comprise a further apparatus microphone sensitivity, and generating a detectable audio signal further may comprise controlling the intensity of the detectable audio signal based on the further apparatus microphone sensitivity.

The at least one parameter may comprise a further apparatus microphone frequency sensitivity, and generating a detectable audio signal further may comprise controlling the intensity of the detectable audio signal based on the further apparatus microphone frequency sensitivity.

The at least one parameter may comprise a distance between the apparatus and further apparatus, and generating a detectable audio signal further may comprise controlling the intensity of the detectable audio signal based on the distance.

The at least one parameter may comprise a camera pose estimate, and generating a detectable audio signal further may comprise controlling the intensity of the detectable audio signal based on the camera pose estimate.

The method may further comprise recording and/or encoding a captured audio signal associated with an audio scene. The method may further comprise: receiving at least one further audio signal captured by the further apparatus configured to monitor the audio scene; determining at least one time indicator value for each of the at least one further audio signal and the audio signal; and synchronizing at least one of the at least one further audio signal and the audio signal signal stream to another signal stream based on the at least one time indicator value.

The method may further comprise filtering the detectable audio signal component from the at least one further audio signal and the audio signal. The determining at least one time indicator value may comprise generating the at least one time indicator for the at least one further audio signal and the audio signal based on the correlation between the at least one further audio signal and the audio signal.

The at least one time indicator may comprise the ratio of the variance and mean values of the correlation between the at least one further audio signal and the audio signal.

According to a third aspect there is provided an apparatus comprising at least one processor and at least one memory including computer program code for one or more programs, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: determine a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value; generate a detectable audio signal when the spectral flatness value is less than the threshold value; and output the detectable audio signal when the spectral flatness value is less than the threshold value.

The apparatus may be further caused to determine at least one parameter associated with a further apparatus, and wherein the generating a detectable audio signal may further cause the apparatus to generate a detectable audio signal based on the at least one parameter.

The at least one parameter may comprise a further apparatus microphone sensitivity, and generating a detectable audio signal further may cause the apparatus to control the intensity of the detectable audio signal based on the further apparatus microphone sensitivity.

The at least one parameter may comprise a further apparatus microphone frequency sensitivity, and generating a detectable audio signal further may cause the apparatus to control the intensity of the detectable audio signal based on the further apparatus microphone frequency sensitivity. The at least one parameter may comprise a distance between the apparatus and further apparatus, and generating a detectable audio signal further may cause the apparatus to control the intensity of the detectable audio signal based on the distance.

The at least one parameter may comprise a camera pose estimate, and generating a detectable audio signal further may cause the apparatus to control the intensity of the detectable audio signal based on the camera pose estimate. The apparatus may be further caused to record and/or encode a captured audio signal associated with an audio scene.

The apparatus may further be caused to: receive at least one further audio signal captured by the further apparatus configured to monitor the audio scene; determine at least one time indicator value for each of the at least one further audio signal and the audio signal; and synchronize at least one of the at least one further audio signal and the audio signal signal stream to another signal stream based on the at least one time indicator value. The apparatus may further be caused to filter the detectable audio signal component from the at least one further audio signal and the audio signal.

The determining at least one time indicator value may cause the apparatus to generate the at least one time indicator for the at least one further audio signal and the audio signal based on the correlation between the at least one further audio signal and the audio signal.

The at least one time indicator may comprise the ratio of the variance and mean values of the correlation between the at least one further audio signal and the audio signal. According to a fourth aspect there is provided an apparatus comprising: means for determining a spectral flatness value associated with a captured audio signal associated with an audio scene and compare the spectral flatness value against a threshold value; means for generating a detectable audio signal when the spectral flatness value is less than the threshold value; and means for outputting the detectable audio signal when the spectral flatness value is less than the threshold value.

The apparatus may further comprising means for determining at least one parameter associated with a further apparatus, and wherein the means for generating a detectable audio signal may further comprise means for generating a detectable audio signal based on the at least one parameter.

The at least one parameter may comprise a further apparatus microphone sensitivity, and the means for generating a detectable audio signal further may comprise means for controlling the intensity of the detectable audio signal based on the further apparatus microphone sensitivity.

The at least one parameter may comprise a further apparatus microphone frequency sensitivity, and the means for generating a detectable audio signal further may comprise means for controlling the intensity of the detectable audio signal based on the further apparatus microphone frequency sensitivity.

The at least one parameter may comprise a distance between the apparatus and further apparatus, and the means for generating a detectable audio signal further may comprise means for controlling the intensity of the detectable audio signal based on the distance.

The at least one parameter may comprise a camera pose estimate, and the means for generating a detectable audio signal further may comprise means for controlling the intensity of the detectable audio signal based on the camera pose estimate. The apparatus may further comprise means for recording and/or encoding a captured audio signal associated with an audio scene. The apparatus may further comprise: means for receiving at least one further audio signal captured by the further apparatus configured to monitor the audio scene; means for determining at least one time indicator value for each of the at least one further audio signal and the audio signal; and means for synchronizing at least one of the at least one further audio signal and the audio signal signal stream to another signal stream based on the at least one time indicator value.

The apparatus may further comprise means for filtering the detectable audio signal component from the at least one further audio signal and the audio signal. The means for determining at least one time indicator value may comprise means for generating the at least one time indicator for the at least one further audio signal and the audio signal based on the correlation between the at least one further audio signal and the audio signal. The at least one time indicator may comprise the ratio of the variance and mean values of the correlation between the at least one further audio signal and the audio signal.

An electronic device may comprising apparatus as described herein.

A chipset comprising apparatus as described herein.

Embodiments of the present invention aim to address the above problems. For better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which: Figure 1 shows schematically an electronic device suitable for being employed in embodiments of the application;

Figure 2 shows schematically a multi-user free-viewpoint sharing services system which may encompass embodiments of the application;

Figure 3 shows a schematically network orientated view of the system shown in Figure 2 within which embodiments of the application may be implemented;

Figure 4 shows schematically a method of operation of the system shown in Figure 2 within which embodiments of the application may be implemented;

Figure 5 shows a schematic view of the capture apparatus shown in Figure 3 in further detail;

Figures 6a and 6b show schematic views of the passive and active audio sensing operations;

Figure 7 shows a schematic view of an alignment mode selection example;

Figure 8 shows a schematic view of a location based alignment mode selection example;

Figure 9 shows a schematic view of the server shown in Figure 3 in further detail; Figure 10 shows schematically a method of operation of the server shown in Figures 9 according to embodiments of the application;

Figure 1 1 shows schematically the synchronisation of signals in embodiments of the application; and

Figure 12 shows schematically a method of operation of the server shown in Figure 9 according to further embodiments of the application.

The following describes in further detail suitable apparatus and possible mechanisms for the provision of effective synchronisation for audio signals and similarly audio-visual images and data.

The concept as described with regards to embodiments shown herein is that at least one of the content capture apparatus within the monitored audio scene is configured to emit or insert a detectable audio signal. The detectable audio signal may comprise audio transients detectable by the other capture apparatus within the audio scene. Furthermore the detectable audio signal is configured to be detectable by the other content capture apparatus but not significantly disturb the audio scene. For example in some embodiments the detectable audio signal is an ultrasound audio signal which has a frequency band or range above a typical hearing range.

In some embodiments the content capture apparatus may be configured to analyse the audio scene and generate the detectable audio signal based on the analysis of the audio scene. In other words the content capture apparatus may be configured to capture and analyse audio signals associated with the audio scene and determine when to insert the detectable audio signal. For example the audio scene may be analysed and determined to be unsuitable for performing audio alignment causing the content capture apparatus to insert the detectable audio signal into the audio scene such that a combined audio scene and detectable audio signal is able to be synchronised.

Furthermore the content capture apparatus may be configured to analyse the number and configuration of other capture apparatus within the audio scene and modify the detectable audio signal based on this analysis. For example the content capture apparatus may be configured to modify or change the intensity of the detectable audio signal based on the distances between the content capture apparatus in the audio scene. In some embodiments the content capture apparatus may be configured to similarly change the intensity of the detectable audio signal based on the microphone characteristics of the other content capture apparatus within the audio scene. Furthermore in some embodiments the distances between devices and/or any potential acoustic obstructions between devices may be determined and the detectable audio signal insertion controlled based on these factors.

The audio signal recorded by the other content capture apparatus in the environment would thus comprise a combination of the audio sources forming the original audio scene and the detectable audio signals (the active ultrasound signal). Thus when audio content time delay alignment is attempted the inserted ultrasound signal may be used to determine the delay between the captured audio signals.

In this regard reference is first made to Figure 1 which shows a schematic block diagram of an exemplary apparatus or electronic device 10, which may be used to record or listen to the audio signals and similarly to record or view the audio-visual images and data.

The electronic device 10 may for example be a mobile terminal or user equipment of a wireless communication system.

The electronic device 10 may comprise an audio subsystem 1 1 . The audio subsystem may comprise a microphone(s) or inputs for microphones for audio signal capture and a loudspeaker(s) or outputs for loudspeaker(s) or headphones for audio signal output. The audio subsystem 1 1 may be linked via an audio analogue-to-digital converter (ADC) and digital-to-analogue converter (DAC) 14 to a processor 21 . The electronic device 10 may further comprise a video subsystem 33. The video subsystem 33 may comprise a camera or input for a camera for image or moving image capture and a display or output for a display for video signal output. The video subsystem 33 may also be linked via a video analogue-to- digital converter (ADC) and digital-to-analogue converter (DAC) 32 to the processor 21 .

The processor 21 may be further linked to a transceiver (TX/RX) 13, to a user interface (Ul) 15 and to a memory 22.

The processor 21 may be configured to execute various program codes. The implemented program code may comprise audio and/or video encoding code routines. The implemented program code 23 may further comprise an audio and/or video decoding code. The implemented program code 23 may further comprise a detectable audio signal insertion and control code. The implemented program code 23 may be stored for example in the memory 22 for retrieval by the processor 21 whenever needed. The memory 22 may further provide a section 24 for storing data, for example data identifying and quantifying other audio capture devices within range of the inserted audio signal . The code may in embodiments of the invention be implemented in hardware or firmware.

The user interface 15 may enable a user to input commands to the electronic device 10, for example via a touch interface or keypad, and/or to obtain information from the electronic device 10, for example via the display. The transceiver 13 enables a communication with other electronic devices, for example via a wireless communication network. The transceiver 13 may in some embodiments of the invention be configured to communicate to other electronic devices by a wired connection.

It is to be understood again that the structure of the electronic device 10 could be supplemented and varied in many ways.

A user of the electronic device 10 may use the microphone 1 1 for audio signal capture. The captured audio signal may further be transmitted to some other electronic device or apparatus or be stored in the data section 24 of the memory 22. A corresponding application may be activated to perform the transmission or storage of the audio signal by the user via the user interface 15. This application, which may be run by the processor 21 , causes the processor 21 to execute the encoding code stored in the memory 22.

Similarly the user of the device may use the camera or video sub-system input for video signal capture of video images that are to be transmitted to some other electronic device or apparatus or to be stored in the data section 24 of the memory 22. A corresponding application may similarly be activated to perform the transmission or storage of the video signal by the user via the user interface 15. This application, which may be run by the processor 21 , causes the processor 21 to execute the encoding code stored in the memory 22.

The audio analogue-to-digital converter 14 may convert the input analogue audio signal into a digital audio signal and provide the digital audio signal to the processor 21 . Similarly the video analogue-to-digital converter may convert an input analogue video signal into a digital signal format and provide the digital video signal to the processor 21 . The processor 21 may then process the digital audio signal and/or digital video signal in the same way as described with reference to the description hereafter.

The resulting audio and/or video bit stream is provided to the transceiver 13 for transmission to another electronic device. Alternatively, the coded data could be stored in the data section 24 of the memory 22, for instance for a later transmission or for a later presentation by the same electronic device 10.

The electronic device 10 may also receive a bit stream with correspondingly encoded data from another electronic device via the transceiver 13. In this case, the processor 21 may execute decoding program code stored in the memory 22. The processor 21 may therefore decode the received data, and provide the decoded data to either of the audio or video sub systems such as audio DAC 14 or the video digital-to-analogue converter 32. The audio and/or video digital-to- analogue converter 14, 32 may convert the digital decoded data into analogue data and output the analogue audio signal to the loudspeakers 1 1 , or analogue video signal to the display 33. It would be appreciated that in some embodiments the display and/or loudspeakers are themselves digital in operation, in which case the digital audio signal may be passed directly to the loudspeakers 1 1 and the digital video signal may be passed directly to the display 33. Execution of the decoding program code for audio and/or video signals maybe triggered as well by an application that has been called by the user via the user interface 15. The received encoded data could also be stored instead of an immediate presentation via the loudspeakers 1 1 and display 33 in the data section 24 of the memory 22, for instance for enabling a later presentation or a forwarding to further electronic device (not shown).

In some embodiments of the invention the loudspeakers 1 1 may be supplemented with or replaced by a headphone set which may communicate to the electronic device 10 or apparatus wirelessly, for example by a Bluetooth profile to communicate via the transceiver 13, or using a conventional wired connection.

Although the above apparatus has been described with respect to being suitable for both the upstream operations, in other words recording the event and transmitting the recording, and the downstream operations, in other words receiving the recording and playing the received recording, it would be understood that in some embodiments of the application separate apparatus may perform the upstream and downstream operations.

Figure 2 shows a schematic overview of the system which may incorporate embodiments of the application. Figure 2 shows a plurality of capture apparatus (also known as recording electronic devices or recording apparatus) 210, which may be apparatus 10 such as shown in Figure 1 , configured to record or capture an activity 171 from various angles or directions as shown in Figure 2 by an associated beam 121 . The recording apparatus or capture apparatus 210 closest to the activity 171 are shown in Figure 2 as recording apparatus 210a to 21 Og. Each of the closest recording apparatus 210a to 21 Og have an associated beam 121 a to 121 g.

Each of these capture apparatus 210 may then upload or upstream the recorded signals. Figure 2 shows an arrow 191 representing the recorded signals which may be sent a transmission channel 101 to a server 103. The server 103 may then process the received recorded signals and transmit signal data associated with a 'selected viewpoint', which may be a single recorded or synthesized signal, via a second transmission channel 105 to a end user or viewing apparatus or device 201 a. As indicated above the capture apparatus 210 configured to transmit the recording may be only a capture apparatus and the end user apparatus 201 configured to receive the recorded or synthesized signals associated with the selected viewpoint may be a viewing or listening apparatus only. However in other embodiments the capture apparatus 210 and/or end user apparatus 201 may each have both recording and viewing/listening capacity.

With respect to Figures 3 and 4, the system shown in Figure 2 is described in further detail . Figure 3 shows schematically a system suitable for implementing the embodiments of the application and Figure 4 shows a flow diagram of the operations of the system shown in Figure 3. In the following examples the content is purely audio content. However it is understood that the same methods may be applied to audio-video content.

The system within which embodiments may operate may comprise capture apparatus 210, an uploading or upstreaming network/transmission channel 101 , a server or network apparatus 103, a downloading or downstreaming network/transmission channel 105, and end user apparatus 201 .

The example shown in Figure 3 shows two capture apparatus 210, a first capture apparatus 210a and a n'th capture apparatus 21 On. It is understood that the two capture apparatus are shown only as an example of the possible number of capture devices within the audio scene. The capture apparatus 210 may be connected to the server 103 via the uplink network/transmission channel 101 .

With respect to Figure 5 an example content capture apparatus 210 is shown in further detail . The content capture apparatus in some embodiments thus comprises the audio subsystem 1 1 in the form of microphone(s) or inputs for microphones for audio signal capture. The audio subsystem may be passed to an audio analyser 215 and further to an encoder/recorder 21 1 .

The content capture apparatus may comprise an encoder/recorder 21 1 . The encoder/recorder 21 1 may be configured to record and encode content in the form of a recorded signal . The recorded signal may be the audio signal captured by the audio subsystem microphone or microphone array. The encoder/recorder 21 1 may also perform encoding on the recorded signal data according to any suitable encoding methodology to generate an audio (or video, audio-video encoded signal).

The operation of recording the content to form a recorded signal is shown in Figure 4 by step 401 . Figure 3 shows that more than one capture apparatus 210 may be within the audio scene. Individual capture apparatus may be configured to capture the event and generate audio signals associated with the capture apparatus's position and recording capability. Thus with respect to Figure 4 the first capture apparatus 210a carries out the recording as step 401 a, a second capture apparatus carries out the recording operation as step 401 b, and the n'th capture apparatus 21 On carries out the recording operation as step 401 n.

The capture apparatus 210 may also comprise an up-loader or transmitter 213 which formats the audio signal for transmission over the network/transmission channel 105. Furthermore in some embodiments the transmitter 213 may encode positional information to assist the server in locating the captured audio signal. This audio signal and positional data 191 may be transmitted over the uplink transmission channel 101 . The uploading (transmission) of the content data and optionally the positional data is shown in Figure 4 by step 403. Figure 3 furthermore shows more than one capture apparatus 210 may upload the audio signals. Thus with respect to Figure 4 the first capture apparatus 210a carries out the uploading of first capture apparatus audio signal (and possibly positional) data 191 a as step 403a, a second capture apparatus carries out the uploading operation of second capture apparatus audio signal (and possibly positional) data 191 b as step 403b, and the n'th capture apparatus 21 On carries out the uploading operation of n'th capture apparatus audio signal (and possibly positional) data 191 n as step 403n. It is appreciated that any number of capture apparatus 210 may be connected to the server 103. Furthermore in some embodiments the uplink network/transmission channel 101 may be a single network, for example a cellular communications link between the capture apparatus and the server. In some embodiments the communications channel may operate or span across multiple networks, for example the data may pass over a wireless communications link to an internet gateway in the wireless communications system and then pass over an internet protocol related physical link to the server. The uplink network/transmission channel 101 may be a simplex network or part of a duplex or half-duplex network.

The uplink network/communications channel 101 may comprise any one of a cellular communication network such as a third generation cellular communication system, a Wi-Fi communications network, or any suitable wireless or wired communication link. In some embodiments the recording and uploading operations may occur concurrently or substantially concurrently so that the information received at the server may be considered to be real time or streamed data. In other embodiments the uploading operation may be carried out at a time substantially later than the recording operation and the information may be considered to be uploaded data.

Referring back to Figure 5 the content capture apparatus 210 may furthermore comprise an audio analyser 215. The audio analyser 215 may be configured to receive the captured audio signals and divide the audio signals into frames. In some embodiments the frames are equal length frames. For example each frame may be 10 seconds long, but may be less than or more than 10 seconds long in order to provide enough information for the analysis described therein.

The audio analyser 215 may furthermore determine for each frame the spectral flatness of the signal. The spectral flatness of the signal can be determined by dividing the geometric mean of the power spectrum by the arithmetic mean of the power spectrum. For example by using the following equation:

where x(n) represents the magnitude of frequency spectrum bin number n. The flatness value may determine the noise like nature of the captured audio signal. A high value (a value close to 1 .0) means that the signal is noise like and are likely to be difficult to align. A low value (a value close to 0) means that the signal has information in specific spectral bands which can be used to align the audio signals according to the methods described herein.

The audio analyser 215 may furthermore be configured to compare the spectral flatness value to a threshold value. When the flatness value is above a threshold value then a signal can be passed to an alignment signal generator 219 indicating that the detectable audio signal is to be generated and inserted into the audio scene. Otherwise when the audio analyser 215 determines that the spectral flatness value is less than the threshold then a signal or indicator can be passed to the alignment signal generator 219 indicating that no detectable audio signal is to be generated and inserted into the audio scene.

The operation of analysing the captured audio signal, and determining the spectral flatness of the audio signal and whether it is greater/less than a 'synchronisation analysis' threshold value is shown in Figure 4 by step 451 . In some embodiments the content capture apparatus 210 comprises an apparatus analyser 217. The apparatus analyser 217 may be configured to receive information about the other capture apparatus in the audio scene. This information may be received from a server configured to store information on the capture apparatus in the audio scene. For example as described above the encoder/recorder may transmit with the audio signal positional information identifying the position of the capture apparatus to the server. The server may then pass this information to a capture apparatus which requests the positional information of other capture apparatus.

Furthermore in some embodiments the capture apparatus information may be obtained on a peer-to-peer basis from other capture apparatus. For example the apparatus analyser 217 may be configured to receive directly positional information from other capture apparatus also within the audio scene.

The capture apparatus information may as described herein be positional information and/or may be information about the capacity of the other apparatus. For example in some embodiments the apparatus analyser 217 may be configured to receive information such as the number of microphones, the spatial sensitivity of the microphones, or the spectral frequency sensitivity of the microphones. This information may in some embodiments be passed to the alignment signal generator 219.

The content and audio signal alignment may be performed more efficiently if the location information (or example using GPS, indoor positioning, digital compass, gyroscope etc) about the capture apparatus location is known while performing the alignment processing. For example where the capture apparatus in the audio scene are known to be separated by more than a threshold distance with respect to each other then either the captured audio scene is likely to be very different for the different capture apparatus or the microphone sensitivity of the capture apparatus is likely to result in poor signal capture quality, and result in failure of achieving audio based alignment. In such embodiments, when the separation is greater than the threshold then the apparatus analyser 217 may be configured to control the alignment signal generator 219 to not generate the detectable audio signal (even when the audio analyser 215 determines that the audio scene is 'flat'). Furthermore in some further embodiments the apparatus analyser 217 may be configured to generate an indicator which is passed to the uploader/transmitter 213 to indicate to the server that the uploaded content is not suitable for audio alignment when there are no other capture apparatus within the threshold distance. Thus in such embodiments the computing resources for attempting audio alignment for such temporal segments is saved, resulting in improved efficiency.

In some embodiments the location and/or positioning of the capture apparatus may be determined by deriving camera pose estimate values (CPE) from the camera images. Camera pose estimate values are external or extrinsic parameters associated with the camera. For example pose parameters may be the camera position and orientation estimate values and may be determined from multiple images after determining any intrinsic or internal camera parameters. Intrinsic or internal camera parameters may for example be parameters such as the focal length, the optical centre, the aspect ratio associated with the camera. The pose estimate may be determined in some embodiments by applying an analytic or geometric method where given that the image sensor (camera) is calibrated the mapping from 3D points in the scene and 2D points in the image is known. If the geometry of the object is also known, the projected image of the object on the camera image is a well-known function of the object's pose. Once a set of control points on the object, typically corners or other feature points, has been identified it is then possible to solve the pose transformation from a set of equations which relate the 3D coordinates of the points with their 2D image coordinates. Algorithms that determine the pose of a point cloud with respect to another point cloud are known as point set registration algorithms, if the correspondences between points are not already known. In some embodiments the pose estimates may be determined by applying genetic algorithms where the pose represent the genetic representation and the error between the projection of the object control points with the image is the fitness function. Furthermore in some embodiments the pose estimate may be determined using Learning-based methods where the system learns the mapping from 2D image features to pose transformation.

In such embodiments an approximate positional estimate may be obtained by comparing the visual content in the image or video with a globally registered image database.

Furthermore by comparing the visual content in an image against other images then it can be furthermore determined whether the capture apparatus are within the same audio scene region. For example whether the capture apparatus are within the same room and therefore although separated by a distance capable of capturing the same audio scene or although separated by a shorter distance located in different rooms and therefore not capable of capturing the same audio scene.

Furthermore the operation of analysing the other apparatus in the audio scene and generating suitable control signals based on the analysis is shown in Figure 4 by step 453.

In some embodiments the capture apparatus 210 may comprise an alignment signal generator 219. The alignment signal generator 219 may be configured to receive the capture apparatus information with respect to other apparatus in the audio scene from the apparatus analyser 217 and furthermore the output of the audio analyser 213. The alignment signal generator 219 may then be configured to selectively generate a suitable alignment signal. The alignment signal as described herein may be a predetermined or known audio signal. The audio signal may in some embodiments furthermore be an ultrasound audio signal. In other words in some embodiments the audio signal may have a spectral frequency range above the range of the typical hearing range. For example the detectable audio signal may be a predetermined audio signal within the frequency range between 16.5 kHz to 19.5 kHz. In some embodiments of the alignment signal generator 219 may be configured to control the intensity or the power of the detectable audio signal based on the output from the apparatus analyser 217. For example in some embodiments the amplitude or power of the detectable audio signal may be dependent on the distance between the capture apparatus. Furthermore the frequency range of the detectable audio signal may be dependent on the spectral sensitivity of microphones in the other capture apparatus such that the generated audio signal may be within the frequency range detected by the microphones of the other capture apparatus but above the typical listeners hearing range.

Furthermore in some further embodiments the alignment signal generator 219 may be configured to generate an indicator which is passed to the uploader/transmitter 213 to indicate to the server whether the detectable audio signal has been generated. In such embodiments the server on receiving this indicator may be configured to actively filter and process the detectable audio signal part of the captured audio signal rather than processing all of the captured audio signal . Furthermore the operation of generating a detectable audio signal controlled by the apparatus information and the analysis of the captured audio signal is shown in Figure 4 by step 455.

The alignment signal generator 219 may furthermore be configured to output the detectable audio signal to the audio subsystem and in particular the loudspeaker transducer and thus broadcast the detectable audio signal to any capture apparatus within the audio scene.

The operation of transmitting the detectable audio signal to the other devices within the audio scene is shown in Figure 4 by step 457. With respect to Figure 6a a scenario is shown where only the ambient audio scene sources are captured by the capture apparatus or devices. In this example the ambient audio scene source 171 is shown generating an audio signal 1601 which is detected and captured by the capture apparatus (Device 1 ) 210a and a further capture apparatus (Device 2) 210b. This scenario may represent the situation where the captured ambient audio scene source is determined to have a spectral flatness value greater than a determined threshold and thus no detectable audio signal is added to the audio scene.

With respect to Figure 6b a scenario where the detectable audio signal is added to the ambient audio scene source. In this example the ambient audio scene source 171 is shown generating an audio signal 1601 which is detected and captured by the capture apparatus (Device 1 ) 210a and a further capture apparatus (Device 2) 210b. This scenario may represent the situation where the captured ambient audio scene source is determined by the further capture apparatus to have a spectral flatness value less than a determined threshold and thus would produce poor quality audio synchronisation data. The further capture apparatus (Device 2) 210 is then configured to generate a detectable audio signal in the form of an ultrasound signal which is emitted by the further capture apparatus (the ultrasound signal being emitted by the further capture apparatus is shown in Figure 6b by reference 1605. The emitted ultrasound signals 1603 can then be captured along with the ambient audio scene audio signals at the capture apparatus (Device 1 ) 210a. As shown in Figures 6a and 6b and described herein, in some embodiments, the insertion of the detectable audio signal may be controlled based on the analysis of the audio scene source audio signal. This control may be performed over time, in other words the detectable audio signal may be switched on and off based on the audio scene source audio signal . For example Figure 7 shows an audio signal 1700 timeline and an associated a suitability determination 1701 and an alignment mode determination 1702. This example shows 3 sections or periods where the detectable audio signal is switched on and off. The first period 171 1 when based on the analysis of the audio signal the suitability determination indicates that the audio signal is suitable for audio alignment and a passive audio alignment mode is selected (in other words no detectable audio signal is generated or inserted). Following the first period 171 1 is a second period 1713 when based on the analysis of the audio signal the suitability determination indicates that the audio signal is not suitable for audio alignment and an active audio alignment mode is selected (in other words a detectable audio signal is generated and inserted). Finally in this example following the second period 1713 is a third period 1715 when based on the analysis of the audio signal the suitability determination indicates that the audio signal is again suitable for audio alignment and a passive audio alignment mode is selected (in other words no detectable audio signal is generated or inserted).

In some embodiments the control of the generation and transmission of the detectable audio signal is determined based on positioning information (for example capture apparatus GPS positioning information, indoor positioning information etc.). In some embodiments when the capture apparatus are separated by more than a threshold distance then the audio scene captured by the separated capture apparatus is likely to be significantly different and therefore result in poor audio signal synchronisation even when a detectable audio signal is generated and broadcast.

This is shown for example in Figure 8 wherein the locations or positions of a first capture apparatus 210a and a second capture apparatus 210b are shown. Furthermore the regions within which the capture apparatus are separated by less than and more than a threshold distance is shown in Figure 8.

Thus for example a first region or temporal segment 1801 is shown when the first capture apparatus 210a and the second capture apparatus 210b are positioned in proximal distance less than a threshold distance and as such are suitable for audio alignment either using the passive or active modes of alignment. The first region or temporal segment 1801 is then followed by a second region or temporal segment 1803 when the first capture apparatus 210a and the second capture apparatus 210b are positioned with a separation distance greater than the threshold distance and as such are not suitable for audio alignment either using the passive or active modes of alignment. In some embodiments the server may receive an indicator which indicates that the audio positioning synchronisation method will not be successful and therefore is not attempted.

The server 103 in Figure 3 is shown in further detail in Figure 9 and the operation of the server described with reference to Figure 4 is described in further detail in Figure 10. Where the same (or similar) components or operations are described the same reference number may be used.

The server 103 may comprise a receiver or buffer 221 which may receive the recorded signal data (and in some embodiments the positioning data) 191 from the uplink network/communications channel . The receiver/buffer 221 may be any suitable receiver for receiving the recorded signal data (and in some embodiments the positioning data) according to the format used on the uplink network/communications channel 101 . The receiver/buffer 221 may be configured to output the received recorded signal data and in some embodiments the positioning data to the synchronizer 223.

The buffering may enable the server 103 to receive the recorded signal data from the capture apparatus 210 for the time reference required. This buffering may therefore be short term buffering, for example in real-time or near real-time streaming of the recorded signal data the buffering or storage of the recorded signal data may be in the order of seconds and the receiver/buffer may use solid state memory to buffer the recorded signal data. However where the capture apparatus 210 themselves store and upload the recorded signal data at a later time the receiver/buffer 21 1 may store the recorded signal data in a long term storage device, for example using magnetic media such as a RAID (Redundant Array of Independent Disks) storage. This long term storage may thus store the recorded signal data for an amount of time to enable several different capture devices to upload the recorded signal data at their convenience.

Furthermore in some embodiments the buffering may further comprise filtering of the audio signals. The filtering may for example be performed to select the detectable audio signal components from the captured audio signal. In other words the filtering may be to focus the analysis on the detectable audio signal components introduced during the 'active audio alignment' modes of operation. In some embodiments the filtering is configured to reduce the effect of the audio signals where it is determined that the audio signal is likely to produce a poor result. For example where the captured audio signal is determined not to be within the audio scene region because of the positional information then the captured audio signal may be 'removed' or filtered from the synchronisation operation. For the following example we may define the vector b as the received and buffered i'th recorded signal data for a time period of length T seconds. Furthermore where the sample rate of the i'th recorded signal data is S Hz, the number of time samples N within b _i may then be defined by the following equation

The receiving/buffering operation is shown in both the system operation as shown in Figure 4 and the server operation as shown in Figure 10 as step 405.

The server 103 further comprises a synchronizer 223. The synchronizer 223 receives at least two independently recorded signal data 191 a, 191 b,.., 191 n from the receiver/buffer and outputs at least two synchronized recorded signal data signal.

The synchronizer 223 does so by variable length framing of the recorded signal data, selecting a base recorded signal data and then aligning the remainder of the recorded signal data with the base recorded signal. The at least two synchronized recorded signal data are then passed to the processor/transmitter 227 for further processing.

The synchronization operation is shown in Figure 4 by step 407.

With respect to Figures 9 and 10 the configuration and operation of the synchronizer 223 may be described in further detail.

The synchronizer 223 may comprise a variable length framer 301 . The variable length framer may receive the at least two recorded signal data values 191 from the receiver/buffer 221 . The variable length framer 301 may generate framed recorded signal values, by generating a single sample value from a first number of recorded signal data sample values. An example of the variable length framer 301 carrying out variable length framing may be according to the following equation

- 1, otherwise

where vlf _{i j} (k) is the output sample value for the first number of recorded signal data samples for the i'th recorded signal data, fj the first number (otherwise known as the input mapping size), bi(k.fj+h) the input sample value for the (k.fj+h) sample. For each mapping or frame k.fj defines the first input sample index and k.fj+fj-1 the last input sample index. The index k defines the output sample or variable frame index. Thus as described previously for a time period T where there are N input sample values, the variable length framer 301 outputs N/fj output sample values each of which is formed dependent on fj adjacent input sample values.

The index vlfjdx indicates the run time mode for the variable length framing. In some embodiments the value of vlfjdx is set to 0 where ^ ^ < 2ms , otherwise the value of vlfjdx is set to 1 . The run-time mode may indicate the calculation path for the variable length framing operation. This is, whether the output value of vlf _{i j} (k) is calculated from the amplitude envelope directly (vlfjdx == 1 ) or from the sign adjusted energy envelope (vlfjdx != 1 ). The decision which mode is to be used depends on the duration of the fj. If the duration of fj is less than 2 milliseconds the amplitude envelope calculation path may be selected, otherwise the energy envelope calculation path may be used. In other words, for small input mapping sizes it is more advantageous to track the amplitude envelope than the energy envelope. This may improve the resilience to false synchronization results.

The variable length framer 301 may then repeat the operation of variable length framing for each of the number of signals identified for the selected space to generate an output for each of the recorded signals so that the output samples for each of the recorded signals have the same number of sample values for the same time period. The operation of the variable length framer 301 may be such that in embodiments all of the recorded signal data are variable length framed in a serial format, in other words one after another. In some embodiments the operation of the variable length framer 301 may be such that more than one of the recorded signal data may be processed at the same time or substantially at the same time to speed up the variable length processing for the time period in question. The output of the variable length framer may be passed to the indicator selector The operation of variable length framing is shown in Figure 10 by step 4071 .

The synchronizer 223 may also comprise an indicator selector 303 configured to receive the variable length framed sample values for each of the selected space of recorded signal data and generate a time alignment indicator for each recorded data signal.

The indicator selector 303 may for example generate the time alignment indicator tlnd for the i'th signal and for all variable time frame sample values j from 0 to M using the following equation. tlnd _j (k) = max _T {vlf _u , vlf _kJ \ 0 < i < U, 0 < k < U, 0 < j < M where max _T maximises the correlation between the given signals with respect to the delay τ . This maximisation function locates the delay τ where the signals are best time aligned. The function may in embodiments of the invention be defined as

max _T {x, y) = max _lag (xCorr _lag ), 0≤ lag <

xCorr,

where T _Up _Per defines the upper limit for the delay in seconds. In suitable embodiments, the upper limit may be set to two seconds as this has been found to be a fair value for the delay in practical recording and networking conditions. Furthermore, wSizej describes the number of items used in the maximum calculation for each fj. In some embodiments, the number of items used in the maximisation calculation may be about T _Window= 2.5s which corresponds to wSize , = T window in samples for each fj. The above equation as performed in

embodiments therefore returns the value "lag" which maximises the correlation between the signals. Furthermore the equation: tCorr _u (k) = xCorr _T 0 < i < U, 0 < k < U, 0 < j < M may provide the correlation value.

The indicator selector 303 may then pass the generated time alignment indicator (tlnd) values to the base signal determiner 305.

The calculation of time alignment indicator values is shown in Figure 10 in step

4073.

The synchronizer 223 may also comprise a base signal determiner 305 which may be configured to receive the time alignment indicator values from the indicator selector 303 and indicate which of the received recorded signal data is suitable to synchronize the remainder of the recorded signal data to.

The base signal determiner 305 may first generate a series of time aligned indicators from the time alignment indicator values. For example the time aligned indicators may be a time aligned index average, a time aligned index variance and a time aligned index ratio which may be generated by the base signal determiner 305 according to the following three equations. J M-l

tlndAve^ = ∑ tlnd _{i k} (j), 0 < i < U, 0 < j < U

1 ^M~1i \

tIndVar _u = ∑ lnd _{i k} (j) - tIndAve _u ), 0 < i < U, 0 < j < U

, . ^ j tlndVar- .

tlndRatioyi) = — , 0≤ i < U

j ₌₀ tIndAve _i

The base signal deternniner 305 may sort the indicator tlndRatio in increasing order of importance. For example the base signal determiner 305 may sort the indicator tlndRatio so that the ratio value having the smallest value appears first, the ratio value having the second smallest value appears second and so on. The base signal determiner 305 may output the sorted indicator as the ratio vector tlndRatioSorted. The base signal determiner 305 may also record the order of the time indicator values tlndRatio by generating an index tlndRatioSortedlndex which contains the corresponding original position indices for the sorted result. Thus if the smallest ratio value was found at index 2, the next smallest at index 5, and so on the base signal determiner 305 may generate a vector with the values [ 2, 5, ...]. The base signal determiner 305 may then use the generated indicators to determine the base signal according to the following equation: base _ signal _ idx = tlndRatio >SortedIndices(o) time _ alignipase _ signal _ idx) = 0 The determination of the base signal is shown in Figure 10 by step 4075.

The base signal determiner 305 may also determine the time alignment factors for the other recorded signal data from the average time alignment indicator values according to the following equation: time _ align{i) = tIndAve _{base signal idx i} , 0 < i < U, i≠ base _ signal _ idx

The determination of the time alignment values for the remaining signals is shown in Figure 10 by step 4077.

The base signal determiner 305 may then pass the base signal indicator value base_signal_idx and also the time alignment factor values time_align for the remaining recorded signals to the signal synchronizer 307.

The synchronizer 223 may also comprise a signal synchronizer 307 configured to receive the recorded signals via the receiver/buffer 221 and the base signal indicator value and the time alignment factor values for the remaining recorded signals. The signal synchroniser 307 may then synchronize the recorded signals by adding the time alignment value to the current time indices of each of the signals.

Furthermore in some embodiments the synchronizer 223 may be configured to receive from the capture apparatus an indicator of the quality of the captured audio signal and be able to bias the alignment of the content based on the quality indicator. For example the synchronizer 223 may determine that the current audio signal being analysed has an associated indicator indicating the audio signal is poor quality and use previously determined delay values until the audio signal quality improves. The quality of the audio signal may for example be determined by the content apparatus audio analyser and be based on the spectral flatness and/or the power level of the audio signal.

This synchronisation operation may be shown with respect to Figure 1 1 . Figure 1 1 shows four recorded signals. These recorded signals may be a first signal (signal 1 ) 501 , a second signal (signal 2) 503, a third signal (signal 3) 505 and a fourth signal (signal 4) 507. After being processed by the variable length framer 301 , the indicator selector 303 and the base signal determiner 305 the signal synchronizer 307 may receive a base signal indicator value base_signal_idx with a value of 3 561 , and furthermore receive time_align values for the first signal Time_align(1 ) 551 , second signal Time_align(2), third signal Time_align(3) which is equal to zero, and fourth signal Time_align(4).

As the third signal is the base signal and therefore has no time alignment value no time delay is added to the signal sample values and the synchronized third signal 515 is output. The signal synchronizer 307 may delay the first signal 501 by the Time_align(1 ) 551 value to generate a synchronized first signal 51 1 . The signal synchronizer 307 may delay the second signal 503 by the Time_align(2) 553 value to generate a synchronized third signal 513. The signal synchronizer 307 may also delay the fourth signal 507 by the Time_align(4) 557 value to generate a synchronized third signal 517.

The synchronized recorded data signals may then be output to the processor/transmitter 227.

Thus in summary the apparatus of the server may be considered to comprise a frame value generator which may generate for each of at least two signal streams, at least one signal value for a frame of audio signal values from the signal stream.

The same server apparatus may also comprise an alignment generator to determine at least one indicator value for each of the at least two signal streams dependent on the at least one signal value for a frame of audio signal values for the signal stream. Furthermore the server apparatus may comprise a synchronizer to synchronize at least one signal stream to another signal stream dependent on the indicator values.

The operation of synchronising the signals is shown in Figure 10 by operation 4079. The server 103 may comprise a viewpoint receiver/buffer 225. The viewpoint receiver/buffer 225 may be configured to receive from the end user apparatus 201 data in the form of positional or recording viewpoint information signal - in other words the apparatus may communicate a request to hear or view the event from a specific capture apparatus or from a specified position. Although this is discussed hereafter as the viewpoint it would be understood that this applies to audio only as well as audio-visual data. Thus in embodiments the data may indicate for selection or synthesis a specific capture apparatus from which audio or audio-visual recorded signal data is to be selected or a position such as a longitude and latitude or other geographical co-ordinate system.

The viewpoint selection data may be received from the end user apparatus via the downlink network/transmission channel 105. It would be appreciated that in embodiments of the application the downlink network/transmission channel 105 may be a single network, for example a cellular communications link between the end user apparatus 201 and the server 103 or may be a channel operating across multiple channels, for example the data may pass over a channel operating over a wireless communications link to a internet gateway in the wireless communications system and then pass over an internet protocol related physical link to the server 103.

The viewpoint selection is shown in Figure 4 by step 408.

The downlink network/communications channel 105 may also comprise any one of a cellular communication network such as a third generation cellular communication system, a Wi-Fi communications network, or any suitable wireless or wired communication link. In some embodiments the uplink network/communications channel 101 and the downlink network/communications channel 105 are the same network/communications channel. In other embodiments the uplink network/communications channel 101 and the downlink network/communications channel 105 share parts of the same network/communications channel. Furthermore in embodiments both the downlink 105 network/communication channel is a pair of simplex channels, or a duplex or half duplex channel configured to carry information to and from the server either at the same time or substantially at the same time. The processor/transmitter 227 may comprise a viewpoint synthesizer or selector signal processor 309. The viewpoint synthesizer or selector signal processor 309 may receive the viewpoint selection information from any end user apparatus and then select or synthesize suitable audio or audio-visual data to be sent to the end user apparatus to provide the end user apparatus 201 with the content experience desired.

Thus in some embodiments, where the viewpoint selection information may identify a specific recording apparatus or device 201 the signal processor 309 selects the synchronized recorded signal data from the recording apparatus indicated.

In other embodiments, where the viewpoint selection information may identify a specific location or direction, the signal processor 309 selects the synchronized recorded signal data which is positioned and/or directed closest to the desired position/direction. In other embodiments where specific location/direction are specified a synthesis of more than one nearby synchronized recorded signal data may be generated. For example the signal processor 309 may generate a weighted averaging of the synchronized recorded signal data nearby the specific location/direction may be used to provide an estimate of the audio or audio-visual data which may have been recorded at the specified position.

Furthermore in other embodiments where a capture apparatus 201 suffers a failure of a recording component, or recorded signal data is missing or corrupted the signal processor 309 may compensate for the missing or corrupted recorded signal data by synthesizing the recorded signal data from the synchronized recorded signal data from neighbouring recording apparatus 210. The signal processor 309 may in some embodiments determine the nearby and neighbouring recording apparatus 210 and further identify the closest recording apparatus to the desired position by using the positional data provided by the capture apparatus.

The output of the signal processor 309 in the form of desired (in other words selected recorded or synthesized) signal data 195 may be passed to the transmitter/buffer 31 1 . The selection/processing of the recorded signal data is shown in Figure 4 by step 409.

The processor/transmitter 227 may further comprise a transmitter/buffer configured to transmit the desired signal data 195 via the downlink network/transmission channel 105 which has been described previously.

The server 103 may therefore be connected via the downlink network/transmission channel 105 to end user apparatus (or devices) 201 configured to generate viewpoint or selection information and receive the desired signal data associated with the viewpoint or selection information.

Although two end user devices 201 are shown in Figure 3, a first end user apparatus 201 a and a second end user apparatus 201 m, it would be appreciated that any suitable number of end user apparatus may receive signal data from the server 103 and transmit data to the server 103 via the downlink network/transmission channel 105.

The end user apparatus 201 such the first end user apparatus 201 a may comprise a viewpoint selector and transmitter 231 a. The viewpoint selector and transmitter 231 a may use the user interface 15 where the end user apparatus may be the apparatus shown in Figure 1 to allow the user to specify the desired viewing position and/or desired capture apparatus. The viewpoint selector and transmitter 231 a may then encode this information in order that it may be transmitted via the downlink network/communications channel 105 to the server 103.

The end user apparatus 201 such as the first end user apparatus 201 a may also comprise a receiver 233a configured to receive the desired signal data as described above via the down-link network/communications channel 105. The receiver may decode the transmitted desired signal (in other words a selected synchronized recorded signal or synthesized signal from the synchronized recorded signals) to generate content data in a format suitable for viewing.

The reception of the desired signal data is shown in Figure 4 by step 413.

Furthermore the end user apparatus 201 such as the first end user apparatus 201 a may also comprise a viewer 235a configured to display or output the desired signal data as described above. For example where the end user apparatus 201 may be the apparatus as shown in Figure 1 the audio stream may be processed by the audio ADC/DAC 14 and then passes to the loudspeaker 1 1 and the video stream may be processed by the video ADC/DAC 32 and output via the display 33.

The viewing/listening of the desired signal data is shown in Figure 4 by step 415.

Therefore in summary the apparatus in the form of the end user apparatus may be considered to comprise an input selector configured to select a display variable. As described above the display variable may be an indication of at least one of a recording apparatus, a recording location, which may or not be a marked as a recording apparatus location, and a recording direction or orientation.

The apparatus in the form of the end user apparatus may furthermore be summarised as being considered to comprise a transmitter configured to transmit the display variable to a further apparatus, wherein the further apparatus may be the server as described previously. Furthermore the same apparatus may be considered to comprise a receiver configured to receive a signal stream from the server apparatus, wherein the signal stream comprises at least one signal stream received from a recording apparatus synchronized with respect to a further signal stream received from a further recording apparatus. The same, end user, apparatus may also be summarized as comprising a display for displaying the signal stream.

Thus in embodiments the ability to mix between capture apparatus is made possible. End users may in embodiments select between recorded signal data from different capture apparatus with improved timing and cueing performance as the recorded signal data is synchronized. Furthermore the generation of synthesized signal data using the synchronized recorded signal data allows the end user to experience the content from locations not originally recorded or improve on the recorded data from a single source to allow for deficiencies in the original signal data - such as loss of recorded signal data due to network issues, failure to record due to partial or total device failure, or poorly recorded signal data due to interference or noise. With respect to Figure 12, the operation of further embodiments of the server with respect to buffering and synchronization of the recorded signal data is shown. In these embodiments rather than synchronizing the recorded signal data using a single time alignment indicator further time alignment indicators may be generated for further time instances or periods.

In these further embodiments the operations similar to the operations shown in steps 405, and 4071 to 4079 are marked with the same references. Thus the buffer/receiver 221 may receive the recorded signal data streams in step 405. In the server 103 shown with respect to these embodiments the buffered recorded signal may be defined as b _{n i} where the subindex n describes the time instant from which the recorded signal is buffered. The subindex n is an element of the set G, in other words the number of different time instants to be used to determine the base signal. The starting position for each buffering location may be described by Tloc _n , that is the signal buffered starting from Tloc _n - T seconds.

The variable length framer 301 may perform a variable length framing operation in step 4071 on each of the sub-periods using the previously described methods.

The indicator selector 303 may calculate the time alignment indicators in step 4073 by applying the following equations to determine the time index average, the time index variance and the time index ratio for all the sub-periods according to the following equations:

, 0 < j < U

, . tlndVar ,.

tlndRatioyi) = > — , 0≤i < U

~i tlndAve. . where tlnd _{t i j} describes the tlnd _uj for the t ^th time instant.

The base signal determiner 305 may in addition to the determination of the base signal and the generation of the time alignment factors may carry out an additional precursor step and make a decision on whether to include a new time instant or period to the calculations. This decision may be for example according to the following expressions: add new time location

tIndRatio(\)

t!ndRatio(o

tlndThr 1,

accCorr (tIndRatioSortedlndices (l ))

0, otherwise

where accCorr{i) 0 < i < U

where tCorr _{t i j} describes the for the t ^th time instant.

In some embodiments of the invention the base signal determiner 305 may make the above decision with a condition which would limit the number of new time instants to be added to some predefined threshold to disable a potential infinite loop of iterations being carried out.

The decision of whether a new time location is to be added is shown in Figure 12 by step 701 . Where a new time location is to be added the base signal determiner 307 may add a new time period to G, in other words the process performs another check at a different time than before and the loop passes back to step 407.

This addition of a new time instant to G can be seen in Figure 12 as step 703. When no further time location/instants are to be added the base signal determiner 305 may then perform the operation of determining the base signal based on the indicators as described previously. The determination of the base signal is shown in Figure 12 by step 4075.

Furthermore the base signal determiner 305 may also determine the time alignment factors for the remaining signals as described previously and shown in Figure 12 in step 4077.

The signal synchronizer 307 may then use this base signal determination and the time alignment factors for the remaining recorded signals to synchronize the recorded signals as described previously and shown in Figure 12 in step 4079. In some embodiments of the invention the loop is disabled or not present and time alignment indicators are determined for at least two of the sub-sets of the total time periods using the equations described above in order to improve the synchronization between recorded signals as the indicators are determined for different time periods.

Although the above has been described with regards to audio signals, or audiovisual signals it would be appreciated that embodiments may also be applied to audio-video signals where the audio signal components of the recorded data are processed in terms of the determining of the base signal and the determination of the time alignment factors for the remaining signals and the video signal components may be synchronised using the above embodiments of the invention. In other words the video parts may be synchronised using the audio synchronisation information. It shall be appreciated that the term user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers. Furthermore elements of a public land mobile network (PLMN) may also comprise apparatus as described above. In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.

The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.

Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.

Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Previous Patent: METHOD AND APPARATUS FOR ENCODING AND DECODING IMAGES

Next Patent: AN APPARATUS AND METHOD FOR SENSING