Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
A METHOD, SYSTEM AND DEVICE FOR GENERATING ASSOCIATED AUDIO AND VISUAL SIGNALS IN A WIDE ANGLE IMAGE SYSTEM
Document Type and Number:
WIPO Patent Application WO/2017/149124
Kind Code:
A1
Abstract:
Provided is a system for generating associated audio and visual signals. The system is configured to process wide angle image data such that a sector of the wide angle image can be selected. The system is also configured to process surround sound signal data such that a signal comprising sound from a determined direction can be selected. The system is arranged to enable said sound from a determined direction to be associated with a selected image sector such that said image sector can be viewed with sound related to a viewing direction of the image sector. Or, the system is arranged to enable an image sector to be selected for association with sound from a determined direction such that a sound signal from a determined sound direction can be listed to whilst viewing images associated with said sound direction.

Inventors:
WILLIAMS ADAM PETER (GB)
MCARDLE STEPHEN (GB)
Application Number:
PCT/EP2017/055013
Publication Date:
September 08, 2017
Filing Date:
March 03, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CENTRICAM TECH LTD (GB)
International Classes:
H04N5/232
Foreign References:
US20120162362A12012-06-28
US20160050366A12016-02-18
US20130321568A12013-12-05
CN2012080885W2012-08-31
GB201521034A2015-11-30
Attorney, Agent or Firm:
INCOMPASS IP EUROPE LIMITED (GB)
Download PDF:
Claims:
Claims

1. A method of generating associated audio and visual signals in a wide angle image display system, comprising the steps of:

processing wide angle image data to select a sector of the wide angle image comprising a portion of said wide angle image data for display on a monitor or screen;

processing a sound signal of the wide angle image display system in order to associate with said selected image sector a portion of the sound signal related to a viewing direction of the selected image sector. 2. The method of claim 1, further comprising the steps of:

tracking a location of said selected image sector within a wide angle image field; and using tracking information of said selected image sector within said wide angle image field to further process the sound signal of the wide angle image display system to associate a further processed portion of the sound signal related to a viewing direction of the tracked image sector.

3. The method of claim 2, wherein the tracking information is used continuously to further process the sound signal of the wide angle image display system. 4. The method of claim 2, wherein the tracking information is used periodically to further process the sound signal of the wide angle image display system and/or used in response to a tracking amount change exceeding a predetermined threshold to further process the sound signal of the wide angle image display system. 5. The method of any one of the preceding claims, wherein the wide angle image and/or the sound signal of the wide angle image display system are transmitted to an electronic processing device and said wide angle image and/or said sound signal of the wide angle image display system are processed at the electronic processing device to associate a selected image sector with a portion of the sound signal related to a viewing direction of the selected image sector.

6. The method of any one of claims 1 to 4, wherein the selected image sector is transmitted to an electronic processing device for display on a monitor or screen thereof, but the sound signal of the wide angle image display system is transmitted to said electronic processing device and said sound signal is processed at the electronic processing device to associate with said selected image sector a portion of the sound signal related to a viewing direction of the selected image sector.

7. The method of any one of the preceding claims, wherein multiple selected image sectors are generated and respective ones of the multiple selected image sectors are transmitted to respective electronic processing devices.

8. The method of claim 7, wherein the sound signal of the wide angle image display system is transmitted to each of said electronic processing devices such that said sound signal is processed at each said electronic processing device to associate a respective portion of the sound signal based on a viewing direction of an associated selected image sector transmitted to said electronic processing device.

9. The method of any one of the preceding claims, wherein the viewing direction of the selected image sector comprises a viewing datum of said selected image sector.

10. The method of claim 9, wherein said datum comprises a centre axis of the selected image sector.

11. The method of claim 9 or claim 10, wherein the sound signal comprises a surround sound field.

12. The method of any one of claims 9 to 11, wherein the sound signal of the wide angle image display system comprises an ambisonic sound signal. 13. The method of claim 12, further comprising the steps of: rotating the ambisonic sound signal to align one of its coordinate system axes with the viewing datum of the selected sector image;

processing the rotated ambisonic sound signal to determine at least one virtual microphone which generates a sound signal from a direction of the viewing datum; and

emitting said generated sound signal in association with display of the selected image sector.

14. The method of claim 13, wherein the ambisonic sound signal is an "A" format signal, a "B" format signal, or a "C" format signal.

15. The method of any one of the preceding claims, wherein the wide angle image is a substantially surround or panoramic image.

16. A system for generating associated audio and visual signals in a wide angle image display system, comprising:

a processor for:

processing wide angle image data to select a sector of the wide angle image comprising a portion of said wide angle image data for display on a monitor or screen; processing a sound signal of the wide angle image display system in order to associate with said selected image sector a portion of the sound signal related to a viewing direction of the selected image sector.

17. A handheld electronic device comprising:

a display for displaying an image sector of a wide angle image; and

a processor for processing a sound signal associated with the wide angle image in order to associate with said displayed image sector a portion of the sound signal related to a viewing direction of the displayed image sector.

18. A computer readable medium comprising machine readable instructions which, when executed by a processor of an electronic processing device implements the steps of the method of any one of claims 1 to 15.

19. A method of generating associated audio and visual signals in a wide angle image display system, comprising the steps of:

processing a sound signal of the wide angle image display system to select a portion of the sound signal related to a determined sound direction; and

processing wide angle image data to select a sector comprising a portion of the wide angle image data for display on a monitor or screen, said selected image sector being selected as one which relates to the determined sound direction. 20. The method of claim 19, further comprising the steps of:

tracking said determined sound direction within a sound field of said wide angle image system; and

using tracking information of said determined sound direction to further process the wide angle image data to associate a further processed sector of the wide angle image data with the tracked sound direction.

21. The method of claim 19 or claim 20, wherein the sound signal comprises a surround sound field. 22. The method of any one of claims 19 to 21, wherein the sound signal of the wide angle image display system comprises an ambisonic sound signal.

23. The method of claim 22, further comprising the steps of:

aligning one of a number of coordinate system axes of the ambisonic sound signal with the determined sound direction;

processing the ambisonic sound signal to determine at least one virtual microphone which generates a sound signal from the determined sound direction; and

emitting said generated sound signal in association with display of a selected image sector having a viewing datum related to the determined sound direction.

24. The method of claim 23, wherein the ambisonic sound signal is an "A" format signal, a "B" format signal, or a "C" format signal.

25. The method of any one of claims 19 to 24, wherein the wide angle image is a substantially surround or panoramic image.

26. A system for generating audio and visual signals in a wide angle image display system, comprising:

a processor for:

processing a sound signal of the wide angle image display system to select a portion of the sound signal related to a determined sound direction; and

processing wide angle image data to select a sector comprising a portion of the wide angle image data for display on a monitor or screen, said selected image sector being selected as one which relates to the determined sound direction.

27. A handheld electronic device comprising:

a processor for processing a sound signal of a wide angle image display system to select a portion of the sound signal related to a determined sound direction; and

a display for displaying a sector of a wide angle image, said sector comprising a portion of the wide angle image being selected as one which relates to the determined sound direction.

28. A computer readable medium comprising machine readable instructions which, when executed by a processor of an electronic processing device implements the steps of the method of any one of claims 19 to 25.

Description:
A METHOD, SYSTEM AND DEVICE FOR GENERATING ASSOCIATED AUDIO AND VISUAL SIGNALS IN A WIDE ANGLE IMAGE SYSTEM

Field of the Invention.

The invention relates to a method, system and device for generating associated audio and visual signals in a wide angle image display or projection system, and in particular, to a method, system and device for use in a panoramic moving image capture, display and/or projection system. Background of the Invention.

The field of view or image view captured by a camera is determined by the lens of the camera at the location of interest. In the case of surveillance systems, for example, some camera units have a 360 degree panoramic image capturing device that has been developed using solid state image recording means. One reason why this type of camera unit is not currently favoured is that the recorded panoramic image is recorded in a severely distorted format as a consequence of the manner by which the 360 degree panoramic image scene is captured and then recorded by the solid state image recording means. In camera units of this type, powerful information processing means are normally required to correct image distortion prior to viewing images resulting in loss of picture quality, system latency and greatly increasing the power consumption of such systems. Such 360 degree panoramic image scenes captured by the lens of a camera unit are typically circular or elliptical images. As most, although not all, consumer applications require rectangular images, such circular or elliptical images are typically not an acceptable format.

It is, however, possible to provide an image sensor system capable of converting circular or elliptical 360 degree panoramic distorted or warped moving images into an acceptable format compatible with modern consumer applications as taught in either of: applicant's Patent Cooperation Treaty (PCT) application number PCT/CN2012/080885 filed 31 st August 2012, the total contents incorporated herein by reference in its entirety; and applicant's United Kingdom patent application number 1521034.7 filed 30 th November 2015, the total contents incorporated herein by reference in its entirety. In particular, it is possible with such an image sensor system to enable a user to select an image sector of the circular or elliptical 360 degree panoramic moving image to view or project onto a suitable rectangular image display, monitor or screen. What is therefore needed to enhance a user's experience is a means of delivering an associated sound signal for a selected image sector of a surround image or a means of associating an image sector of a surround image with a detected sound direction.

Summary of the Invention.

The present invention concerns aligning a portion of a sound field with a direction of an image view or vice-versa such that a portion of a sound field played to a user substantially aligns with their image viewport.

A first main aspect of the invention provides a method of generating associated audio and visual signals in a wide angle image display system. The method may comprise processing wide angle image data to select a sector of the wide angle image comprising a portion of said wide angle image data for display on a monitor or screen. It may also comprise processing a sound signal of the wide angle image display system in order to associate with said selected image sector a portion of the sound signal related to a viewing direction of the selected image sector.

Preferably, a location of said selected image sector within a wide angle image field is tracked and resultant tracking information is used to further process the sound signal of the wide angle image display system to associate a further processed portion of the sound signal related to a viewing direction of the tracked image sector.

A second main aspect of the invention provides a system for generating associated audio and visual signals in a wide angle image display system. The system may comprise a processor for processing wide angle image data to select a sector of the wide angle image comprising a portion of said wide angle image data for display on a monitor or screen. A sound signal of the wide angle image display system may be processed in order to associate with said selected image sector a portion of the sound signal related to a viewing direction of the selected image sector. The processing of the sound signal and the wide angle image data may be performed by the same processor or by different processors.

A third main aspect of the invention provides a handheld electronic device. The device may comprise a display for displaying an image sector of a wide angle image and a processor for processing a sound signal associated with the wide angle image in order to associate with said displayed image sector a portion of the sound signal related to a viewing direction of the displayed image sector.

A fourth main aspect of the invention provides a computer readable medium comprising machine readable instructions which, when executed by a processor of an electronic processing device implements the steps of the method of the first main aspect.

A fifth main aspect of the invention provides another method of generating associated audio and visual signals in a wide angle image display system. The method may comprise processing a sound signal of the wide angle image display system to select a portion of the sound signal related to a determined sound direction. It may also comprise processing wide angle image data to select a sector comprising a portion of the wide angle image data for display on a monitor or screen, said selected image sector being selected as one which relates to the determined sound direction.

A sixth main aspect of the invention provides a system for generating audio and visual signals in a wide angle image display system. The system may comprise a processor for processing a sound signal of the wide angle image display system to select a portion of the sound signal related to a determined sound direction. Wide angle image data may be processed to select a sector comprising a portion of the wide angle image data for display on a monitor or screen, said selected image sector being selected as one which relates to the determined sound direction. The processing of the sound signal and the wide angle image data may be performed by the same processor or by different processors.

A seventh main aspect of the invention provides a handheld electronic device. The device may comprise a processor for processing a sound signal of a wide angle image display system to select a portion of the sound signal related to a determined sound direction. The device may include a display for displaying a sector of a wide angle image, said sector comprising a portion of the wide angle image being selected as one which relates to the determined sound direction.

An eighth main aspect of the invention provides a computer readable medium comprising machine readable instructions which, when executed by a processor of an electronic processing device implements the steps of the method of the fifth main aspect.

Other aspects of the invention are in accordance with the appended claims. The summary of the invention does not necessarily disclose all the features essential for defining the invention; the invention may reside in a sub-combination of the disclosed features.

Brief Description of the Drawings.

The foregoing and further features of the present invention will be apparent from the following description of preferred embodiments which are provided by way of example only in connection with the accompanying figures, of which:

Figure 1 is a schematic block diagram of a system in accordance with an embodiment of the invention;

Figure 2 is a schematic block diagram of a server shown in Fig. 1 in more detail in accordance with an embodiment of the invention;

Figure 3 is a schematic block diagram of a user device 12 shown in Fig. 1 in more detail in accordance with an embodiment of the invention;

Figures 4 and 5 are respectively a side view and a plan view of a camera unit according to an embodiment of the invention for capturing a panoramic image;

Figure 6 is a plan view a display image of the panoramic image or a plan of the panoramic image or a representative image of the panoramic image with multiple window frames overlying said display image to enable a user to make a selection of a window portion for viewing;

Figures 7a and 7b illustrate ambisonic microphone arrays according to an embodiment of the invention;

Figure 8 comprises a polar graph of the B format polar patterns for pattern of 0.1 through to > 0.8 for an ambisonic sound field;

Figure 9 shows changes in the B format polar patterns of Fig. 8 as the subtended angle is adjusted; and

Figure 10 is an embodiment of the invention comprising a distributed system.

Description of Preferred Embodiments.

The following description is of preferred embodiments by way of example only and without limitation to the combination of features necessary for carrying the invention into effect. Reference in this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments, but not other embodiments.

In the following description, by wide angle image is meant an image field that is broader than a normal image view of a conventional camera unit, for example, a wide angle image may be an image with a field of view equal to or exceeding a 60 degree arc segment or as much as or even more than 180 degrees around the camera unit. More particularly, this term may be taken to refer to a substantially panoramic image view which may comprise a 360 degrees annular or doughnut shaped image view around the camera unit or a 'full' panoramic image view defining a generally hemispherical or spherical image field or vista about said camera unit.

The invention generally relates to a method for generating associated audio and visual signals in a panoramic moving image capture, display and/or projection system. In one embodiment, an optical device such as a camera unit may have an image sensor and a lens/mirror system or the like for capturing a wide angle lens projection image such as a panoramic image and directing it towards a planar photo-sensitive surface of the image sensor. The photo sensors may be arranged in a layout matching characteristics of the lens projection for converting the lens projection image incident on the photo-sensitive surface into an electrical or electronic signal or signals of a substantially de-warped or un-distorted image of the projection image. The camera unit may comprise part of a system for generating associated audio and visual signals.

In one embodiment, the system may comprise a processor for processing wide angle image data to select a sector of the wide angle image comprising a portion of said wide angle image data for display on a monitor or screen. The system may also process a sound signal associated with said wide angle image data in order to associate with said selected image sector a portion of the sound signal related to a viewing direction of the selected image sector. Alternatively, the system may select an image sector to be associated with sound from a determined sound direction. The processing of the wide angle image data may be performed remotely from the processing of sound signal data.

Fig. 1 is a schematic block diagram of a system 10 in accordance with an embodiment of the invention. The system 10 comprises a user device 12 and a server 14 in communication via a network such as the internet 15 or the like. The system 10 further comprises an image capturing unit such as a camera unit 16 with an associated microphone system or array 18. The camera unit 16 may have an image sensor with a plurality of photo sensors arranged to match a wide angle/fisheye lens 17 utilising the entire projected circular (or elliptical) image, with minimized wasted resolution. In some embodiments, the system 10 may include a projector 18 adapted to receive image data from the server 14 or camera unit 16 for projection onto a screen 19 or the like.

Figs. 2 and 3 are schematic block diagrams of a user device 12 and a server 14, respectively, shown in Fig. 1 in more detail in accordance with an embodiment of the invention. The server 14 comprises a processor 20, a memory 22, a user interface 24, and application modules 14 for processing image data and/or sound data and implementing the methods of the invention. The user device 12 comprises a processor 30, a memory 32, a user device interface 34 with input means 36 such as a keyboard, touch screen and/or a microphone, and output means 38 such as a display and a speaker.

The system 10 allows a user device 12 to access the server 14 and/or the camera unit

16/microphone system 18 to receive image and/or sound data captured by the camera unit 16 and its associated microphone system 18. The data may be received in processed or unprocessed formats. However, it will be understood that, whilst Fig. 1 depicts a server-based system 10, the invention is not limited to server-based systems. The camera unit 16 with its associated microphone system 18 may be embodied in a single unit which is directly connectable via a local wireless or a wired connection to a user device 12 or may form part of said user device 12 to provide a stand-alone integrated user device 12 embodying all or some of the aspects of the present invention. The user device 12 in any of its embodiments may comprise a personal computer (PC), a tablet computer, a phablet, a smart phone or any suitable handheld electronic device. Figs 4 and 5 are respectively a side view and a plan view of a preferred camera unit 16 according to one embodiment of the invention for capturing a wide angle or panoramic image. The camera unit 16 comprises an image sensor 42 and means 44 for capturing a very wide angle image and directing it towards a photo-sensitive surface 46 of the image sensor 42. The image sensor 42 comprises a plurality of photo sensor image sensors preferably arranged to convert the lens projection image incident on the photo-sensitive surface into an electrical or electronic signal or signals of a de-warped or un-distorted image of the projection image.

The means 44 for capturing a wide angle image and directing it towards a photosensitive surface 46 of the image sensor 42 may comprise any suitable system or arrangement known to the skilled artisan for capturing a wide angle image and focusing it onto an optical chip such as sensor 42. For example, the image capturing means 44 may comprise any one or any combination of: a lens, a set of lenses, a mirror, a set of mirrors, a prism, or a set of prisms. In this embodiment, the image capturing means comprises a hemispherical lens or a fish-eye lens 48. The fish-eye lens 48 captures a substantially panoramic image surrounding the camera unit 16. The substantially panoramic image may not be a full panoramic image. It may comprise a 360 degree doughnut shaped image view in a selected plane about the image capturing means 44. In many applications, a doughnut shaped image scene surrounding the camera unit 16 having a depth of about one third of the height of the hemispherical image scene is sufficient. However, in other applications, the substantially panoramic image is preferably a full or almost full panoramic image comprising the whole or most of the generally hemispherical image view surrounding the camera unit 16 and may even comprise a spherical view surrounding the camera unit 16 by using one or more camera units 16.

The image sensor 42 converts image light incident on its planar photo- sensitive surface 46 into one or more electrical signals. The image sensor 42 comprises a solid-state device such as an image sensor chip 42. The planar photo- sensitive surface 46 of the chip 42 comprises a plurality of photo sensors or pixels (not shown in Fig. 4) which each convert light incident thereon into an electrical or electronic signal or signals. The photosensitive surface 46 may not cover the whole of the upper surface of the chip 42, but may cover only that part of the chip's surface that underlies the image capturing means 44. Input means 50, memory 52, processor 54, output means 56, and means for buffering or storing digital media data 58 may also be provided in the camera unit 16. A number of optical chip types that could be employed as the image capturing means 44 are already known such as a charge-coupled device (CCD) chip and a complementary metal oxide semiconductor (CMOS) chip. However, any chip suitable for converting a captured image incident on a photo-sensitive surface thereof into an electrical signal or signals can be employed in the system and camera unit of the invention.

In one embodiment, the plurality of photo sensors or pixels of the image sensor chip 42 may be arranged on the planar photo-sensitive surface 4 in a layout matching characteristics of the lens projection for converting the lens projection image incident thereon into an electrical or electronic signal or signals of a de-warped or undistorted version of the warped or distorted lens projected panoramic image. The choice of this arrangement is primarily to provide a pattern of photo sensors or pixels that most or more efficiently captures a received image incident thereon, projected from a lens that may introduce distortion and warping of the image, with less distortion than in conventional arrangements and/or requiring less processing to remove, reduce or correct any distortion, if present, of the image or a portion of the image for display on a conventional monitor display.

In one embodiment, the individual photo sensor elements may be arranged in concentric (or concentric paired) circles, placed at radii matched to the actual lens distortion. In doing this, the 1st order warping of the image from these lenses is immediately compensated for by the sensor design and the de- warping overload is substantially reduced or eliminated. The projected image from the lens unit 16 onto the designed image sensor has the plurality of image sensors arranged to convert the lens projection image incident on the photosensitive surface into an electrical or electronic signal or signals of a de- warped or un-distorted image of the projection image.

Where the camera unit 16 is capturing a moving image and the electrical signals representative of the captured moving image are converted to a digital format, the digital format may conveniently comprise a digital video format. This may be an interlaced or progressive digital video format. In the case of an interlaced digital video format, alternate ones of the series of concentric circles of photo sensors or pixels are respectively assigned to odd and even lines of the digital video signal. In the case of a progressive digital video format, all of the concentric circles of sensors or pixels are assigned to the progressive line of the digital video signal. The camera unit 16 may include means for converting electrical signals generated by the photo sensors or pixels 42 into digital image data. The means 54 for converting may comprise suitable circuitry provided in the camera unit 16 as is known for CCD chips or additional circuitry on the image sensor chip itself as is known for CMOS chips. In any event, it is useful that the camera unit 16 in this embodiment includes circuitry for generating digital image data from an output of the image sensor chip 42, although it will be understood that, for some camera units, depending on their application or use, the camera unit may have an output for transmitting output signals from the image sensor chip 42 to a remote device such as server 14 or the user device 12 for conversion to digital image data at the remote device.

The integrated camera unit 16 thus far described may form a camera unit 16 in a system whereby the camera unit 16 outputs digital image data or signals representative of a captured panoramic image to one or more remote devices 12, 14 where further processing of the signals or data may be implemented. The camera unit 16 may be arranged to capture still images in the manner of a stills images camera, but is preferably arranged to capture moving images.

Where the camera unit 16 does include means for converting electrical signals generated by the photo sensors or pixels into digital image data, it may also include means 58 for buffering and/or storing the digital image data. The means 58 for buffering or storing the digital image data may comprise a flash memory device or chip.

In one embodiment, the camera unit 16 may include an input means 50 such as a button, touch screen or the like for receiving a selection of a window portion, i.e. image sector, of the captured panoramic image. The camera unit 16 also has a memory means 52 storing computer readable instructions which, when executed by a processor 54, controls the operations of the camera unit 16 including enabling a user to enter a selection of an image sector of the stored or buffered panoramic image via the input means 50. It will be understood that buffering digital image data enables real-time selection of an image sector of a viewed image scene whereas storing digital image data allows selection of an image sector of a previously recorded image scene.

In one embodiment, the selection of an image sector may be received at the server 14 or camera unit 16 and may be received from a user device 12. In any of the foregoing embodiments and as illustrated by Fig. 6, the selection of an image sector may comprise displaying an image 60 or a plan 62 of the panoramic image or a representative image 64 of the panoramic image captured by the camera unit 16 with a window frame 66 overlying the display image, plan or representative image. The image or a plan of the panoramic image or a representative image of the panoramic image may be displayed on any of a display screen of the camera unit 16, a display screen of a peripheral device connected to the camera unit 16 or with which the camera unit 16 forms an integral part, or a display screen of a user device 12. The position of the window frame 66 relative to the display image, plan or representative image can be manipulated by a user as illustrated by arrowed lines 68, 70, 72 to move it over the display image, plan or representative image to thereby select a desired image sector of the panoramic image for retrieval, processing and/or display/playing/projecting. The aspect ratio of the window frame 66 may be defined or may be adjustable by a user. Zooming may be achieved by decreasing a size of the window frame 66 whilst retaining its pixel width and depth. Of course, other methods of selecting a desired window portion may be provided depending on the nature and configuration of the input means. Such a method is disclosed in PCT/CN2012/080885.

In order to identify a selected image sector data for retrieval, the photo sensors or pixels 46 of the image sensor chip 42 are preferably addressable or otherwise identifiable whereby, when a user inputs a selection of a desired image sector using the window frame 66, this is translated by a processor into addresses or identifiers for appropriate individual or blocks of photo sensors or pixels and/or the lines they occupy in order to determine which digital image data is to be retrieved in response to a selected image sector. Identification of a selected image sector may be assisted by associating a coordinate system 74 with the image 60 or the plan 62 of the panoramic image or the representative image 64 of the panoramic image.

The processor also enables the retrieval from the buffer or storage means 58 of the digital image data comprising the selected image sector and for further processing and/or display on the screen of the camera unit 16, or for the retrieved digital image data to be outputted on an output port of the camera unit 16 to a peripheral device or a separate display device such as a user device 12.

Also shown in Figs. 4 and 5 is a surround microphone system 18 for the camera unit

16. The surround microphone system 18 may comprise a microphone array 18. It is adapted to function to capture a sound field generally overlapping or matching the extent of the image field captured by the camera unit 16. The surround microphone system 18 is selected as one which allows sound data generated thereby to be processed to select or isolate a portion of the sound field related to a predetermined direction which may comprise a selected viewing direction of an image sector of the image field or a determined sound direction. Typically, an array 18 is made up of omnidirectional microphones, directional microphones, or a mix of omnidirectional and directional microphones distributed adjacent or around the camera unit 16, linked to a processing device such as its own data processing and storage modules 52-58, the server 14 or a user device 12 which records and interprets the results into a coherent form. Arrays 18 may also be formed using numbers of closely spaced microphones, for example, directional microphones, sufficient to capture the surround sound field.

Given a fixed physical relationship in space between transducer array elements of the different individual microphones, digital signal processing (DSP) of the signals from each of the individual microphones can create one or more "virtual" microphones isolating sound from a determined direction within the surround sound field. Different algorithms permit the creation of virtual microphones with extremely complex virtual polar patterns and even the possibility to steer the individual lobes of the virtual microphones patterns so as to home-in- on, or to reject, particular sources, i.e. directions, of sound. In the case where the array consists of omnidirectional microphones which accept sound from all directions, electrical signals of the microphones contain the information about the sounds coming from all directions. Joint processing of these sounds allows the selection of a sound signal coming from a given direction. Thus, a microphone array 18 can comprise many known arrangements which enable selection of sound coming from a given direction by using known algorithms to process one or many channel signals of a captured surround sound field.

In one embodiment as illustrated in Figs. 4 and 5, the surround microphone system 18 may comprise a microphone array 18 arranged near the camera unit 16. Each of said microphones 18a may comprise a unidirectional, bi-directional, cardioid or shotgun type of microphone or any combination thereof. Whilst the microphones 18a are shown arranged around a perimeter of the camera unit 16, they may be arranged above or below the camera unit and may comprise a single microphone module. In one embodiment, the microphones 18a are arranged in a tetrahedral array and preferably a B format tetrahedral array. Preferably, pairs of micro-electrical mechanical system (MEMS) microphones are provided at or near each of the four corners of the tetrahedral microphone array to provide eight channels. The use of small MEMs microphones enables the size of the array to be miniaturised. Furthermore, MEMs microphones are intrinsically omnidirectional. The pairs of MEMS microphones may be slightly spatially offset within a pair and/or between the pairs. More preferably, each pair of MEMs microphones is arranged with one spaced a small distance behind the other. Each pair of MEMS microphones may be vertically displaced to provide a single cardioid pattern beam formed from the two omnidirectional MEMs elements. As such, the signal from the first MEMS element may be delayed in time and then combined with the signal from the second MEMS element to cancel out signals from behind the pair of MEMs elements to provide a controlled cardioid or other pattern. The eight channel arrangement so formed provides for better manipulation of the cardioid pickup pattern at each of the four corners of the microphone array enabling much more accurate and tight beam forming.

Where a user of a device 12 selects to receive data comprising an image sector of the panoramic image field captured by the camera unit 16, in one embodiment, the method of the invention allows the sound data of the microphone array 18 to be processed to obtain a sound signal embodying sound from a direction related to a viewing direction of the selected image sector. The viewing direction of the selected image sector is preferably a centre axis normal to a plane of the selected image sector when it is displayed or projected onto a plane, i.e. when it is viewed in a generally rectangular format on a monitor or screen. In any event, it will be understood that the term "viewing direction" when related to a selected image sector is intended to be representative of a direction of view of a user viewing such an image sector. The method of the invention therefore enables a user to play sound at their user device 12 which comprises a portion of the surround sound field which is relevant to or associated with the selected image sector of the image field captured by the camera unit 16. The width and/or depth of the portion of the sound field associated with the selected image sector may automatically be controlled based on the width and/or depth of the viewing angles of the selected image sector or may be adjustable by a user to hone the sound direction more closely to the viewing direction of the image sector. The aspect ratio and pixel sizes of the image sector may be used to determine an initial size of the portion of the sound field associated with the selected image sector. In one embodiment, the location of a selected image sector is tracked within the image field. This may be achieved by tracking a position of the window frame 66 within the image 60 or the plan 62 of the panoramic image or the representative image 64 of the panoramic image. Tracking information for said selected image sector may be used to further process the surround sound signal data to associate a further processed portion of the sound signal related to a viewing direction of the tracked image sector, i.e. to ensure that the sound signal being associated with the selected image sector being viewed remains current to the direction of viewing. The tracking information may be used continuously or periodically to further process the sound signal data, but may also to be used when there is a tracking amount change exceeding a predetermined threshold such as may happen when a user shift the window frame 66 to a new section of the image 60 or the plan 62 of the panoramic image or the representative image 64 of the panoramic image.

In one embodiment, data comprising the selected image sector and data comprising the associated portion of the sound field are transmitted by the camera unit 16 or the server 14 to a user device 12. In another embodiment, data comprising the selected image sector and data comprising the surround sound field are transmitted by the camera unit 16 or the server 14 to a user device 12, wherein the user device 12 processes the surround sound field data to select a portion of the sound field to associate with the selected image sector.

In one embodiment, multiple selected image sectors are generated and respective ones of the multiple selected image sectors are transmitted by the camera unit 16 or the server 14 to respective electronic processing devices. In this embodiment, the surround sound signal data may be transmitted to each of said user devices 12 such that each said user device processes said sound signal data to associate a respective portion of the sound signal based on a viewing direction of the selected image sector transmitted to said user device 12.

It will be understood that the microphone array 18 is arranged with the camera unit 16 such that directions of the sound field are related to viewing directions of the image field. This can be achieved by identifying one or more reference directions for each of the sound field and the image field and matching or registering said directions when connecting the microphone array 18 to the camera unit 16. For example, the camera unit three-dimensional coordinate system may be aligned to the microphone array three-dimensional coordinate system or vice-versa. In one embodiment, the microphone array 18 comprises a plurality of unidirectional spaced apart microphones in a circular, semi-spherical or spherical array format.

In one embodiment, the microphone array is an ambisonic microphone array. In principle, an ambisonic microphone module comprises a combination of orthogonal bi-polar transducer elements with an omnidirectional, pressure sensitive capsule. The output of the omnidirectional, pressure sensitive capsule is referred to as the 'W signal, and provides information about the overall amplitude of sound impinging on the microphone array. The bipolar or figure-of-eight transducer elements forming the array provide the directional information, that is, their outputs can be used to determine the direction from which each element of sound arrives. Preferably, one of these elements points front-back providing the 'X' signal, another points left-right ('Υ'), and a third up-down ('Ζ'). These four signals, W, X, Y, Z, convey everything needed to know about the amplitude and direction of the acoustic signals arriving at the microphone array 18. The four signals together are known as B-format signals, and, if recorded on four discrete tracks, can provide a record of the original sound, captured with total three-dimensional accuracy. A decoder embodied in a processor of the camera unit 16, server 14 or user device 12 can be configured to convert the microphone's output signals into a form suitable to drive one or more speakers.

By combining the W, X, Y and Z signals in various ways, it is possible to recreate the effect of any conventional microphone polar pattern from omnidirectional, through cardioid, hyper-cardioid and figure-of-eight, pointed in any direction. This works in exactly the same way as a conventional stereo middle-and-side microphone, only in three dimensions instead of just one (left-right). With the right combinations of W, X, Y and Z signals, it is therefore possible to replicate the signals that would have been obtained from, say, a stereo pair of crossed cardioids.

The ambisonic microphone array may be an A format, a B format or a C format ambisonic signal array. The ambisonic microphone array may comprise a Nimbus-Halliday microphone, a Soundfield microphone or three figure of eight microphones in an orthonormal arrangement respectively along X, Y and Z directions as illustrated in Figs. 7a and 7b where Fig. 7a shows an array 218 having a support 220 with three figure of eight microphones 218x, 218y, 218z where the X direction microphone 218x being aligned in a horizontal direction as viewed. Fig. 7b shows an ambisonic microphone array 228 having a support 240 with three figure of eight microphones 228x, 228y, 228z where the X direction microphone 228x is aligned in an inclined direction to horizontal as viewed.

With an ambisonic microphone array 218, 228 in one embodiment, the method of the invention involves rotating the ambisonic sound signal to align one of its coordinate system axes with a viewing datum such as a centre axis of a selected sector image and processing the rotated ambisonic sound signal to determine at least one virtual microphone which generates a sound signal from a direction of the viewing datum.

Assuming that the one of the image coordinate system axes aligned with the centre axis of the selected sector image is the X axis, a single virtual microphone pointed along the X+ direction will have a polar pattern that favours audio signals from that direction if it is anything less than a figure of eight response. Thus, processing the W+X ambisonic signals from the rotated reference frame provides one or more such virtual microphones.

An amount of zoom applied to the selected image sector may be communicated to the audio data decoder embodied in the processor of any of the camera unit 16, the server 14 and/or the user device 12 and can be used to determine splay angles and patterns for the virtual microphones such that, when the image sector is a wide view, e.g. corresponding to close subjects, the virtual microphones are splayed widely and the polar patterns are adjusted to be wide, even almost omnidirectional. However, when the image is zoomed-in, the virtual microphones progressively point more strongly toward the centre of the image sector and the patterns narrow towards hyper-cardioid. In the limit, at maximum zoom, the microphones provide almost a mono-directional sound signal.

In one embodiment, data describing the field of view of a selected image sector is communicated to the audio data decoder to enable the decoder to convert the ambisonic signal to generate one or a plurality of outputs based upon the field of view data from the image processing system. Zoom or focus information from the image processing system sets the splay angles of the microphones and their polar patterns. The splay angles and the polar patterns are adjusted as the user zooms the selected image sector. The image centre information from the image processing system sets the centre line of the virtual microphone array.

Taking, for example, a standard B format ambisonic decoder D(w,x,y,z, Theta, Phi, pattern) where w,x,y & z are the B format audio signals, Theta is the rotation angle of the selected image sector in a plane of the one of the image system coordinates, e.g. X, Phi is the tilt angle of the image sector to a reference such as the horizontal X plane, and pattern varies from 0 for omnidirectional through to 0.5 for cardioid to 1 for figure of eight as illustrated in Fig. 8, one implementation of the method of the invention is as follows by way of example, but without limitation to other implementations.

With the B format signal already rotated to align with the camera coordinate system as described above, and information about the selected image sector consisting of the rotation and tilt angles to the centre of the image sector viewport, together with an angle rho representing a part of the sphere presented as the viewport, a stereo pair of virtual microphones for the sound field portion relating to the selected image sector can be calculated as follows:

i) determine the sound signal pattern - this should narrow towards a super-cardioid as the subtended angle falls - this may comprise a wide pattern or a narrow pattern (typical values for the pattern constants are somewhere in region of the narrow pattern = -0.1, wide pattern = -0.8);

ii) Let K = rho/Pi (K = 1.0 at 180 degrees, becoming smaller as the angle decreases), then;

iii) let pattern = narrow pattern + (wide pattern - narrow pattern) * K; and

iv) The angles of the two virtual stereo microphones can be computed. Typically, one may consider an image sector spanning say 90 degrees so one would want the microphones to point about 60 degrees apart so that the required positioning of the virtual microphones would be (Theta + rho/3, phi) and (Theta - rho/3, phi). This together with the pattern data suffices to allow any B format decoder to generate the required virtual stereo microphone array for a selected image sector.

The invention also relates to a converse situation where an image sector of a wide angle image such as a panoramic image is aligned with a determined sound direction such as where the system 10 determines a direction of a detected sound. Thus, in one embodiment, the invention concerns a method of generating associated audio and visual signals by processing the surround sound signal data to select a portion of the sound signal related to a determined sound direction and processing the surround image field data to select an image sector relating to or aligned with the determined sound direction. The detection of a direction of a sound may be determined from the surround sound field data or from a separate sound detection system associated with the camera unit 16. Tracking information of the determined sound direction may be used to further process the image field data to associate a further processed sector of the surround image data with the tracked sound direction.

Referring to Fig. 10, an embodiment of the invention may comprise a distributed system including at least one camera unit 100 and a microphone array 101 outputting digital image data and audio data to one or more separate devices including a memory bank or database 102, a server or controller 104, and one or more user devices 106. The memory bank or database 102 is provided for storing and/or buffering digital image data and audio data. The separate memory bank or database 102 may also be adapted to convert electrical signals received from the camera unit 100 representative of the panoramic image into digital image data in the case where the camera unit 100 does not have this capability. The system includes the server or controller 104 for processing a selection or selections of requested image sectors of the panoramic image from one or more user devices 106 and retrieving digital image data in response to the selections. The one or more user devices 106 are configured for sending requested selections of image sectors of the panoramic image to the server or controller 104 and receiving for display or otherwise digital image data retrieved in response to issued requests. The one or more user devices 106 may also receive from the server 104 audio data for a portion of the sound field associated with a viewing direction of a selected image sector. Alternatively, the one or more user devices 106 may receive the audio data generated by the microphone array 101 and locally process the audio data to obtain a portion of the sound field associated with a viewing direction of a selected image sector. The user devices 106 may comprise any suitable electronic device for displaying image data and playing audio data such as a PC, a personal digital assistant, virtual reality (VR) headset, a smart phone, a games player, a smart television, for example and without limitation. Transmission of digital image data between devices and, more particularly from the camera unit 100 and microphone array 101 to the memory bank or database 102 may be in real time or performed by batches either on demand, by polling or by any other suitable transmission scheme. The camera unit 100 and microphone array 101 may be connected to the memory bank or database 102 by a cable, a cable network, or a communication network. The server/controller 104 and the user devices 106 may also be connected to the memory bank or database 102, the camera unit 100 and microphone array 101 by cables, the cable network, or the communication network. The communication network may be a private communication network, a public network, or a combination of the two. The network may include or comprise the internet as illustrated by the cloud in Fig. 8. It may also comprise or include a local area network (LAN) and/or a wide area network (WAN). The system may comprise a surveillance system employing one or more camera units 100 and microphone arrays 101 according to the invention, a public entertainment events distribution system also employing one or more camera units 100 and microphone arrays 101, or a teleconferencing system by way of examples only.

In general, embodiments of the invention provide a system for generating associated audio and visual signals. The system may be configured to process wide angle image data such that a sector of the wide angle image can be selected. The system may also be configured to process surround sound signal data such that a signal comprising sound from a determined direction can be selected. The system may further be arranged to enable said sound from a determined direction to be associated with a selected image sector such that said image sector can be viewed with sound related to a viewing direction of the image sector. Or, said system may be further arranged to enable an image sector to be selected for association with sound from a determined direction such that a sound signal from a determined sound direction can be listed to whilst viewing images associated with said sound direction. The system may comprise a wide angle image display or projection system. The processing of the wide angle image data may be performed remotely from the processing of the associated sound signal.

It should be understood that the elements shown in the figures, may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces.

The present description illustrates the principles of the present invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.

In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.

While the invention has been illustrated and described in detail in the drawings and foregoing description, the same is to be considered as illustrative and not restrictive in character, it being understood that only exemplary embodiments have been shown and described and do not limit the scope of the invention in any manner. It can be appreciated that any of the features described herein may be used with any embodiment. The illustrative embodiments are not exclusive of each other or of other embodiments not recited herein. Accordingly, the invention also provides embodiments that comprise combinations of one or more of the illustrative embodiments described above. Modifications and variations of the invention as herein set forth can be made without departing from the spirit and scope thereof, and, therefore, only such limitations should be imposed as are indicated by the appended claims.

In the claims which follow and in the preceding description of the invention, except where the context requires otherwise due to express language or necessary implication, the word "comprise" or variations such as "comprises" or "comprising" is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the invention. It is to be understood that, if any prior art publication is referred to herein, such reference does not constitute an admission that the publication forms a part of the common general knowledge in the art.