Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VIDEO COMPRESSION APPARATUS AND METHOD
Document Type and Number:
WIPO Patent Application WO/2006/005129
Kind Code:
A1
Abstract:
The present invention relates to a video compression apparatus and method. The apparatus (10) includes attachment means (18) for attaching the apparatus to a video camera (12); sensor means for detecting motion of the camera and generating motion data therefrom; means for receiving video from the camera; means for associating the motion data with the video; and means (19) for wirelessly transmitting the video and associated motion data to a receiver, wherein the motion data may be utilised at the receiver in compressing the video.

Inventors:
FRANKLIN GUY (AU)
WHISH-WILSON ADELE (AU)
Application Number:
PCT/AU2005/001021
Publication Date:
January 19, 2006
Filing Date:
July 12, 2005
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MOMENTUM TECHNOLOGIES GROUP PT (AU)
FRANKLIN GUY (AU)
WHISH-WILSON ADELE (AU)
International Classes:
G03B17/18; H04N7/18; H04N7/52; (IPC1-7): H04N7/52; H04N7/18; G03B17/18
Domestic Patent References:
WO1998051083A11998-11-12
Foreign References:
JP2001318586A2001-11-16
US20030058347A12003-03-27
US4689673A1987-08-25
Attorney, Agent or Firm:
ALLENS ARTHUR ROBINSON PATENT & TRADE MARKS ATTORNEYS (530 Collins Street Melbourne, VIC 3000, AU)
Download PDF:
Claims:
Claims
1. A video transmission apparatus including: attachment means for attaching the apparatus to a video camera; sensor means for detecting motion of the camera and generating motion data therefrom; means for receiving video from the camera; means for associating the motion data with the video, and means for wirelessly transmitting the video and associated motion data to a receiver, wherein the motion data may be utilised at the receiver in compressing the video .
2. A video transmission apparatus according to claim 1, wherein the sensor means includes means for detecting rotation of the camera about at least one axis.
3. A video transmission apparatus according to claim 2, wherein the sensor means includes means for detecting rotation of the camera about a yaw axis (thereby detecting camera pan) and a pitch axis (thereby detecting camera tilt) .
4. A video transmission apparatus according to claim 3 wherein the sensor means includes a first gyroscope configured to detect camera pan and a second gyroscope configured to detect camera tilt.
5. A video transmission apparatus according to any one of claims 1 to 4, wherein the sensor means includes means for detecting linear acceleration of the camera in at least one direction.
6. A video transmission apparatus according to claim 5 wherein the means for detecting linear acceleration in at least one direction is an accelerometer.
7. A video transmission apparatus according to claim 4, wherein the sensor means includes means for generating motion data in the form of a sequence of values, each representing an instantaneous angular velocity of the camera about an axis, such as a pitch axis and/or a yaw axis.
8. A video transmission apparatus according to any one of claims 1 to 7 wherein, the means for associating the motion data with the video includes means for inserting indicia representative of the motion data into each frame of the video.
9. A video transmission apparatus according to claim 8 wherein the means for inserting indicia into each frame of the video is an onscreen display device.
10. A video transmission apparatus according to claim 8 or claim 9 wherein the indicia is alphanumeric characters overlayed in a small area of the frame.
11. A video transmission apparatus according to claim 10 wherein the alphanumeric characters are indicative of the instantaneous angular velocity of the camera about at least one axis at the time of the frame.
12. A video transmission apparatus according to claim 11, further including a video synch separator for extracting timing information from each frame.
13. A video transmission apparatus according to any one of claims 1 to 7 wherein the means for associating the motion data with the video includes means for digitally encoding and linking the motion data to the video.
14. A videocompression assistance device, attachable to or incorporable within a video capture device, the videocompression assistance device including: sensor means for detecting motion of the video capture device as video is captured and generating motion data therefrom; and means for associating the motion data with the captured video, such that the motion data may be utilised in compressing the captured video.
15. A videocompression assistance device according to claim 14, wherein the sensor means include means for detecting rotation of the videocapture device about at least one axis.
16. A videocompression assistance device according to claim 17, wherein the sensor means includes means for detecting rotation of the videocapture device about a yaw axis (thereby detecting camera pan) and a pitch axis (thereby detecting camera tilt).
17. A videocompression assistance device according to claim 15 or claim 16 wherein the sensor means includes a first gyroscope configured to detect videocapture device pan and a second gyroscope configured to detect videocapture device tilt.
18. A videocompression assistance device according to any one of claims 14 to 17, wherein the sensor means includes means for detecting translational acceleration of the videocapture device in at least one direction.
19. A videocompression assistance device according to claim 18 wherein the means for detecting translational acceleration of the videocapture device in at least one means is an accelerometer. / 13.
20. A videocompression assistance device according to any one of claims 14 to 19, wherein the video capture device is a videocameraenabled mobile telephone handset.
21. A video compression method, said video comprising a plurality of frames, the method including the steps of: detecting motion of the camera as the video is captured and generating motion data therefrom; associating the motion data with the captured video; utilising the motion data to determine an offset between adjacent frames to thereby minimise interframe differences between said frames; and utilising the offset in compressing the video.
22. A video compression method according to claim 21 wherein the step of detecting motion of the camera involves attaching or incorporating an apparatus having motion sensing means to or into the camera.
23. A video compression method according to claim 22 wherein, the motion sensing means include a first gyroscope configured to detect motion of the camera about a yaw axis, and a second gyroscope configured to detect rotation of the camera about a pitch axis.
24. A video compression method according to any one of claims 21 to 23, wherein the motion data includes a sequence of values, each representing an instantaneous angular velocity of the camera about an axis, such as a pitch axis and/or a yaw axis.
25. A video compression method according to any one of claims 21 to 24, wherein the step of associating motion data with the captured video includes the step of inserting indicia representative of the motion data into one or more of the frames.
26. A video compression method according to claim 25, wherein the indicia is • alphanumeric characters, overlayed in a small area of the frame.
27. A video compression method according to any one of claims 21 to 24 wherein the step of associating the motion data with the captured video includes the step of digitally encoding and linking.the motion data to the captured video.
28. A video compression method according to any one of claims 21 to 27 wherein the step of utilising the motion data to determine an offset between adjacent frames includes the step of forming a global motion vector from the motion data, said vector useable as a starting point for determining said offset between adjacent frames. 28. A video camera having a video transmission apparatus according to any one of claims 1 to 11 attached thereto. 29. A videocameraenabled mobile telephone handset having a videocompression assistance device according to any one of claims 14 to 20 incorporated therein.
Description:
VIDEO COMPRESSION APPARATUS AND METHOD Field of the invention

The present invention relates to the field of video capture and transmission. More particularly the present invention relates to apparatus and methods for capturing, transmitting and compressing video signals for the purpose of transmission over communication networks.

Background of the invention

In this specification, where a document, act or item of knowledge is referred to or discussed, this reference or discussion is not an admission that the document, act or item of knowledge or any combination thereof was at the priority date:

(i) part of common general knowledge; or

(ii) known to be relevant to an attempt to solve any problem with which this specification is concerned.

The transmission of live video over communications networks such as the Internet and mobile phone networks is known. For example, low cost digital video cameras known as "webcams" allow low quality video to be captured and transmitted over the Internet. Webcams must be attached to a computer and are therefore only useful for capturing video within the immediate vicinity of the computer. Although the quality of video captured by a webcam is also generally below that captured by a conventional video camera such as a camcorder, they will continue to be relevant in allowing "video calls" to be made through the Internet using voice over IP (VOIP) protocols.

Video conferencing is another example of video transmission over a communications network. However, the large bandwidth required for video conferencing means that a dedicated network must be used. Therefore, the attendant cost of the specialist video conferencing equipment, including the dedicated network, makes video conferencing economically feasible for only large enterprises. Further, video conferencing, like webcams, also requires the conference to occur in a video conferencing room to take advantage of the specialist hardware.

The problems of lack of mobility and expense are being addressed by systems allowing a conventional video camera, such as a camcorder, to be used to transmit video over the Internet. Essentially, a user captures video with the camera which is wirelessly transmitted to a computer for transmission over the Internet. However it has been found that motion of a hand held camera can introduce large inter-frame differences which can equate to a decrease in the compression rate/quality of the video and/or an increase in the computational complexity of the video compression algorithm. Compression of video is usually necessary before it can be transmitted over the Internet and this decrease in compression rate may be reflected as either lower quality images or lower frame rate when the video is received by a viewer.

This tendency of camera motion to impart large inter-frame differences also occurs with video captured by a video-camera-enabled mobile telephone handset. Achieving an optimum compression rate is particularly important in this context, where compression occurs on the handset before transmission of the video over the -usually lower bandwidth - mobile telephone network, either to another mobile phone handset or to a receiver for transmission over the Internet.

Accordingly, it would be advantageous to provide an apparatus and method that can improve the compression rate for captured video intended to be transmitted over a communications network.

Summary" of the invention

According to a first aspect of the present invention there is provided a video transmission apparatus including:

attachment means for attaching the apparatus to a video camera;

sensor means for detecting motion of the camera and generating motion data therefrom;

means for receiving video from the camera;

means for associating the motion data with the video; and

means for wirelessly transmitting the video and associated motion data to a receiver.

According to a second aspect of the present invention there is provided a video compression assistance device, attachable to or incorporable within a video capture device, the video-compression assistance device including:

sensor means for detecting motion of the video capture device as video is captured and generating motion data therefrom; and .

means for associating the motion data with the captured video, such that the motion data may be utilised in compressing die captured video.

The present invention takes the approach of detecting camera motion whilst capturing video and utilising the detected motion to assist in the compression of that captured video. Essentially, the invention utilises hardware to reduce the computational burden on and increase the efficiency of video compression algorithms.

It will be realised that preferred embodiments of the present invention offers several advantages over the prior art. Firstly, the apparatus may be attached to a conventional video camera, to allow video captured by the camera to be transmitted, over a communications network. Secondly, the motion data associated with the video can be used to efficiently determine the optimum offset of adjacent video frames to minimise inter-frame differences introduced by camera motion, thereby improving the compression rate of the video and reducing the computational burden on a compression algorithm.

Preferably, the attachment means attaches the apparatus to the video camera through the accessory shoe of the camera.

Typically, the sensor means detects rotation of the camera about at least one axis. Preferably, the sensor means detects rotation of the camera about a yaw axis (thereby detecting camera pan) and a pitch axis (thereby detecting camera tilt) .

Camera pan and tilt may be detected by any suitable sensor means. For example, the sensor means may include:

■ A first gyroscope lying in a plane substantially perpendicular to the direction of the camera when the apparatus is attached thereto, said first gyroscope detecting camera pan; and

" A second gyroscope lying coplanar with and substantially perpendicular to the direction of the camera when the apparatus is attached thereto, said second gyroscope detecting camera tilt.

The sensor means may generate motion data by any suitable means. Typically, the sensor means includes transducer means for transducing camera motion to variations in an electrical signal, whereupon the motion data can be extracted from the electrical signal.

Preferably, the motion data includes a sequence of values, each representing an instantaneous angular velocity of the camera about an axis, such as a pitch axis and/or a yaw axis.

Typically, the means for associating the motion data with the video includes means (such as an on-screen display device (OSD)) for inserting indicia representative of the motion data into each frame of the video. The indicia may, for example, be alphanumeric characters overlayed in a small area of the frame. Other methods for associating motion data with a video frame that allows the data to be easily extracted from the frame could also be used.

Preferably, each frame has associated motion data that is indicative of the instantaneous angular velocity of the camera about at least one axis at the time of the frame. Timing information of the frame may be extracted from the frame itself, for example by the use of a video sync separator.

According to a third aspect of the present invention there is provided a video compression method, said video comprising a plurality of frames, the method including the steps of:

detecting motion of the camera as the video is captured and generating motion data therefrom; ■

associating the motion data with the captured video;

utilising the motion data to determine an offset between adjacent frames to thereby minimise inter-frame differences between said frames; and

utilising the offset in compressing the video.

The step of detecting motion of the camera may include the step of detecting rotation of the camera about at least one axis.

The camera motion may be detected by attaching an apparatus having motion sensing means to the camera. For example, the motion sensing means may include a first gyroscope configured to detect motion of the camera about a yaw axis, and a second gyroscope configured to detect rotation of the camera about a pitch axis.

Typically, the sensor means includes transducer means for translating camera motion to variations in an electrical signal, whereupon the motion data can be extracted from the electrical signal.

Preferably, the motion data includes a sequence of values, each representing an instantaneous angular velocity of the camera about an axis, such as a pitch axis and/or a yaw axis.

The step of associating motion data with a frame typically includes the step of inserting indicia representative of the motion data into the frame. For example, the indicia may be alphanumeric characters, overlayed in a small area of the frame. It is to be understood however that any indicia that can be extracted from the frame could be used. In other embodiments of the invention where the video is captured in digital, rather than " analogue form, the motion data may be digitally encoded and linked to the frames of digitally stored video.

The step of extracting the motion data from the frame may include performing optical character recognition on the frame.

Preferably the step of utilising the motion data to determine an offset of an adjacent frame includes forming a global motion vector from the motion data, said vector being used as a starting point for determining the offset of the adjacent frame.

The method may be performed using suitable computer software and/or hardware.

The present invention also provides a video camera having attached thereto a video transmission apparatus as described herein.

Brief description of the drawings

An embodiment of the present invention will now be described by reference to the accompanying figures wherein:

Figure 1 is a perspective view of a video camera with an apparatus according to an embodiment of the invention attached.

Figure 2 is a schematic illustration of the components of an apparatus according to an embodiment of the invention.

Figure 3 is a schematic illustration of the video and motion encoder of the invention.

Figure 4 is a schematic illustration of the video transport components of an embodiment of the invention.

Figure 5 is a schematic illustration of a receiver suitable for receiving motion encoded video.

Figure 6 is a schematic illustration of the components of the invention for associating motion data with a video frame.

Figure 7 is a schematic illustration of a video-camera-enabled mobile phone handset having a video compression assistance device according the second aspect of the present invention incorporated therein.

Figure 8 is an illustration of the use of motion encoded video for video compression.

Figure 9 is an illustration of the user of a motion vector derived from motion data in global motion compensation between adjacent frames. Figure 10 is an illustration of the frame difference of adjacent frames with and without global motion compensation.

Detailed description of the drawings

Turning first to Figure 1 there is illustrated a video transmission apparatus 10 that is attached to a video camera 12 via the video camera's accessory shoe 18. A transmitter 19 is mounted on the top of the apparatus 10 for transmitting motion encoded video to a receiver (not shown) .

The camera lens 14 points in a forward direction 16, with the camera 12 and the attached apparatus 10 being rotatable about both a pitch axis (camera tilt) and/or a yaw axis (camera pan) . The video camera can be used in conventional manner to capture video images in a digital or analog format, that are transmitted by the apparatus 10 to a receiver for compression and transmission over a communications network.

As is further described below, the apparatus includes means for detecting motion of the camera and generating motion data therefrom. In this embodiment the motion detection means includes a first gyroscope (not shown) positioned along a yaw axis, to capture camera pan motion and a second gyroscope (not shown) positioned along a pitch axis to capture camera tilt motion. The operation and structure of gyroscopes is known to those skilled in the art and will not be described in further detail here.

The operation and components of the apparatus 10 are described in more detail with reference to Figure 2. As noted above, motion sensors 20 including the abovementioned gyroscopes are positioned within the apparatus 10 to sense the occurrence of camera tilt and pan. Motion data 26 representing the sensed camera motion is generated from the output of the, motion sensors 20. In a conventional manner video data 23 is also output from the video camera 12 and is forwarded with the motion data 26 to the motion encoder 22. At the motion encoder 22 the motion data 26 is associated with the video data to form motion encoded video 25. A transmitter 19 receives the motion encoded video 25 and transmits it to a receiver (not shown) for compression and transmission over a communications network.

Further detail of the sensor means is given by reference to Figure 3. The camera 12 outputs video in a conventional manner with the motion sensors 20 simultaneously outputting motion data 26 to the encoder 22. As noted above, the motion sensors 20 include first and second gyroscopes orientated within the apparatus so that their axes are in horizontal and vertical planes both substantially normal (perpendicular) to the direction of the camera 12. The gyroscopes operate as transducers and convert sensed physical rotation of the camera about each axis to variations in an electrical signal, as would be understood by those skilled in the art. To assist in the transduction process the gyroscopes are calibrated with a reference voltage to represent zero angular velocity about an axis. Additionally, a sensor scale factor is used to represent the proportional change in the reference voltage relative to a corresponding change in angular velocity about an axis. To represent rotation in both directions an output of less than or greater than the reference voltage can be used. The scale factor is dependent on the characteristics of the sensor but will typically have units of millivolt per degree per second mV/°/s. Consequently the maximum angular velocity is the highest rate of change of degrees per second.

The motion data 26 may be extracted from the electrical signal produced by the transducer by a number of means. Firstly, the angular velocity in degrees per second about an axis can be determined by sampling the voltage of the electrical signal output by the gyroscopes. For example, a sensor output of 1.45 volts has a .1 volt change from a reference voltage of 1.35 volts. A .1 volt change in voltage from the reference voltage using a scale factor of 0.67 millivolts produces a calculated angular velocity of 100/0.67 = 150 degrees per second. Thus it is determined that the camera was rotating about an axis at 150 degrees per second at the time the frame was recorded. This extracted motion data 26 is then delivered to the encoder 22 for association with the video frame.

Alternatively, rather than sampling and outputting digital values directly, the gyroscope output may be connected to an analogue to digital converter (not shown) which converts the analogue output to a sequence of digital values in a manner known to those skilled in the art. The analogue signal may be conditioned for input to the analogue to digital converter in a conventional manner. A 10 bit analogue to digital converter will convert the analogue signal from the gyroscope to a series of digital values 25 having one of 1,024 possibilities.

By reference to Figure 4 the motion encoded video 28 is forwarded to the transmitter 19 which in turn forwards the motion encoded video to a receiver 30 for compression and transmission over a communications network, such as the Internet.

Turning to Figure 5, which illustrates operation of the receiver, the motion encoded video 28 is digitised by a video digitiser 32 and forwarded to a decoder 34 where the digitised video 36 and motion data 26 are separated from the motion encoded video 28. The motion data 26 extracted from the motion encoded video 28 is utilised to compensate for global motion 38 in the compression of the video 40. The compressed video is better suited for transmission over a communications network (not shown) . The process of associating motion data 26 with the video 24 is now described by reference to Figure 6. As noted above analogue motion data 26 is passed to an analogue to digital converter 46 for processing by a microprocessor 50 also resident in the apparatus 10. Simultaneously, the composite video signal 42 is delivered from the camera 12 and passed to a video sync separator 44 which extracts timing information, such as the composite and vertical sync, from the composite video 42 . When the video signal is at the rising edge of the first serration in the video sync the video sync output of the sync separator goes high as understood by those skilled in the art. Extraction of timing information in this way allows the most current digitised motion data from the analogue to the digital converter to be associated with the current video frame of the composite video. This synchronisation is achieved by connecting the vertical sync of the video sync separator 44 to an input pin on the microprocessor 50. As noted above, another input pin of the microprocessor 50 is connected to the output of the analogue to digital converter 46. The microprocessor 50 continuously polls the state of the input pin and triggers the sampling period for the conversion of the analogue motion data 26 to a digital format. Typically, the microprocessor initiates the conversion during the vertical blanking interval of the video signal 42. To increase the signal to noise ratio of the sampled motion data, a number of samples are taken with the sample average being output by the microprocessor 50 to a video overlay device 48 which is also coupled as an output from the composite video 42 and the video sync separator 44.

At the video overlay device 48 the output of the microprocessor 50 is converted to indicia such as alpha numeric text. An onscreen display device, forming part of the video overlay 48 displays the text at a particular location on the correct frame of composite video 42 by utilising the timing data from the video sync separator 44. Thus, motion encoded video comprising the video image with overlayed motion data 28 is produced.

An example of a video-camera-enabled mobile telephone handset 9, having a video compression assistance device 11 incorporated therein is now described by reference to Fig 7. The device 11 in this embodiment, takes the form a microprocessor that is incorporated into the circuit board (not shown) of the handset 9, along with other hardware components, such as a video compression chip 27.

The video camera 13 incorporated into the handset captures video in a digital, rather than analogue form, as does the camcorder described above. A digital file 15 stores the aggregate of the video frames captured by the video camera 13 in a suitable format. Motion sensors 20, incorporating gyroscopes (not shown) as described above as well as an accelerometer (not shown) for detecting translational motion of the camera 13, detect camera motion as the video is captured in the manner described above. As with the video 15, the motion data is encoded in a digital form and is linked to the digital video before being stored in a file 21. A motion associating video component 23 receives and motion data from video 15 and motion data files 21 and associates the motion data with the video frames in the manner described above. The motion compensated video is passed to the video compressor 27 for compression in the manner described below, and then to a transmitter 27 for transmission over the mobile telephone network either to another handset, or to a receiver for transmission over the Internet.

Returning to Figure 7 the receiver is described that receivers motion encoded video 28 from the camcorder attachment described above. The video compression process occurring at the receiver is essentially the same as the on-board compression occurring on the mobile telephone handset embodiment just described.

To allow for mobile video capture it is preferred that the transmission be by wireless means. Where an analogue motion encoded video is transmitted, it is received and demodulated at the receiver. A video digitiser 32 converts the analogue video 28 into a plurality of digital video frames in a manner understood by those skilled in the art. A computer program running on the receiver performs optical character recognition (OCR) 52 on the received frames to extract the motion data overlayed on the frame. The frame is then cropped 54 to remove the section where the motion data appeared. Simultaneously the extracted motion data is passed with the cropped image to a global motion compensation process 38 which utilises the motion data for the purpose of compression.

Broadly, a video may be compressed where the difference between a series of frames is due mainly to global motion of the camera between the frames, rather than local motion of objects within the frame. This can be exploited by instead of storing each individual frame, storing a reference image and information about the global motion between the frames. The original frames can then be reconstructed after transmission by applying the information to the reference frame. However, where the compression algorithm has no prior knowledge of global motion due to camera movement, additional processing is required to determine the direction and magnitude of the optimum offset between frames to take advantage of global motion compensation. Essentially, the current and previous frames must iterate over a large number of combinations of displacement and differencing in a two dimensional area to locate this optimum offset.

However, as the motion data supplied to the compression algorithm along with the video describes the combined pan and tilt motion of the camera when originally capturing the frame, it can be formed into a vector (direction and magnitude) of the camera movement between the frames and can be used as a starting point in minimising the search for the optimum offset. It will thus be realised that supplying the motion data transforms the search area from a two dimensional to a linear space. It will also be realised that the computational burden of compression can be reduced by the supply of motion data on each video frame. This may allow for real time compression of video signals received from the handheld camera 16 thus allowing for live broadcast of video over a communications network such as the Internet.

Referring to Figure 8 the previous frame 58 is illustrated comprising a matrix of pixels making up an image. A motion vector is used to find the optimum offset between the previous frame 58 and the current frame 60. The large area 62 of common image data between the previous and current frame can be used in compressing the video for transmission over the communications network. This process is also illustrated by reference to Figure 9 with the common area 62 between the frames being shown.

It will of course be realised that various modifications and improvements to the invention as described are possible and such alterations, and improvements appearing obvious to those skilled in the art are deemed to be within the scope of the present invention.

The word 'comprising' and forms of the word 'comprising' as used in this description does not limit the invention claimed to exclude any variants or additions.