Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TURBO MOTION JPEG
Document Type and Number:
WIPO Patent Application WO/2006/122063
Kind Code:
A3
Abstract:
A method, apparatus and system for turbo motion JPEG is provided. In one embodiment, a method is provided. The method includes receiving a first video frame. The method also includes receiving a second video frame. The method further includes determining a global motion vector of the second video frame relative to the first video frame. Additionally, the method includes encoding the first video frame in JPEG format. Moreover, the method includes encoding the second video frame as a difference between the first video frame and the second video frame after accounting for the global motion vector.

Inventors:
LU NING (US)
LIANG JEMM (US)
Application Number:
PCT/US2006/017795
Publication Date:
March 01, 2007
Filing Date:
May 08, 2006
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ULTRACHIP INC (US)
LU NING (US)
LIANG JEMM (US)
International Classes:
G06K9/36
Foreign References:
US20040161038A12004-08-19
US5430480A1995-07-04
Attorney, Agent or Firm:
VON TERSCH, Glenn, E. et al. (101 Jefferson Drive Menlo Park, CA, US)
Download PDF:
Claims:
CLAIMS "

What is claimed is:

1. A method, comprising: receiving a first video frame; receiving a second video frame; determining a global motion vector of the second video frame relative to the first video frame; encoding the first video frame in JPEG format; and encoding the second video frame as a difference between the first video frame and the second video frame after accounting for the global motion vector.

2. The method of Claim 1, further comprising: determining the difference between the first video frame and the second video frame after moving contents of the second video frame back along the global motion vector.

3. The method of Claim 1, further comprising: determining the difference between the first video frame and the second video frame after moving contents of the first video frame forward along the global motion vector.

4. The method of Claim 1, further comprising: receiving a third video frame; and encoding the third video frame in JPEG format.

5. The method of Claim 1, further comprising: receiving a third video frame; determining a global motion vector of the third video frame relative to the second video frame; and encoding the third video frame as a difference between the second video frame and the third video frame after accounting for the global motion vector.

6. The method of Claim 5, further comprising:

receiving a fourth video frame; and encoding the fourth video frame without reference to previous video frames.

7. The method of Claim 1, wherein: the first video frame is stored in a reference frame buffer; and the second video frame is stored in a frame buffer.

8. The method of Claim 7, further comprising: transferring the second video frame to the reference frame buffer.

9. The method of Claim 8, further comprising: receiving a third video frame; determining a global motion vector of the third video frame relative to the second video frame; and encoding the third video frame as a difference between the second video frame and the third video frame after accounting for the global motion vector.

10. The method of Claim 9, wherein: the third video frame is stored in the frame buffer.

11. The method of Claim 10, further comprising: transferring the third video frame to the reference frame buffer.

12. The method of Claim 11 , further comprising: receiving a fourth video frame; and encoding the fourth video frame without reference to previous video frames.

13. The method of Claim 12, further comprising: transferring the fourth video frame to the reference frame buffer.

14. The method of claim 13, wherein: the reference frame buffer maintains an encoded video frame.

15. The method of claim T ' 3 , wherein: the reference frame buffer maintains an unencoded video frame.

16. A medium embodying instructions, which, when executed by a processor, cause the processor to perform a method, the method comprising: receiving a first video frame; receiving a second video frame; determining a global motion vector of the second video frame relative to the first video frame; encoding the first video frame in JPEG format; and encoding the second video frame as a difference between the first video frame and the second video frame after accounting for the global motion vector.

17. The medium of claim 16, wherein the method further comprises: determining the difference between the first video frame and the second video frame after moving contents of the second video frame back along the global motion vector.

18. The medium of claim 16, wherein the method further comprises: determining the difference between the first video frame and the second video frame after moving contents of the first video frame forward along the global motion vector.

19. A system, comprising: a processor; a bus coupled to the processor; a memory coupled to the bus; and wherein the processor is to: receive a first video frame and receive a second video frame, determine a global motion vector of the second video frame relative to the first video frame, encode the first video frame in JPEG format, and encode the second video frame as a difference between the first video frame and the second video frame after accounting for the global motion vector.

20. The system of claim 19, wherein: the processor is further to: determine the difference between the first video frame and the second video frame after moving contents of the second video frame back along the global motion vector.

21. The system of claim 19, wherein: the processor is further to: determine the difference between the first video frame and the second video frame after moving contents of the first video frame forward along the global motion vector.

22. A system, comprising: a processor; a bus coupled to the processor; a frame buffer coupled to the bus, the frame buffer to store a video frame; and a reference frame buffer coupled to the bus, the reference frame buffer to store a reference video frame.

23. The system of claim 22, wherein: the frame buffer and the reference frame buffer are embodied in a volatile memory coupled to the bus.

24. The system of claim 22, further comprising: means for receiving video images.

Description:

TURBO MOTION JPEG

TECHNICAL FIELD

This invention relates generally to electronically encoding images and more specifically to encoding video images using JPEG as a base format. BACKGROUND ART

When video data is encoded for use by machines such as computers, it can be encoded in a variety of formats depending on the intended use of the video data. For example, a high resolution format may be useful for high bandwidth connections but problematic for low bandwidth connections. Similarly, a low resolution format may be better suited for low bandwidth connections but its resulting image quality may be deemed unsuitable for high bandwidth connections. Considerations such as target devices (e.g. cell phones on one end, high resolution monitors on the other end), source quality (high resolution or low resolution cameras for example), and available processing power can all affect a choice of video format.

In particular, providing video data which may be used with low bandwidth connections and limited processing power in target devices may be useful. Accordingly, it may be useful to provide a format for video data which allows for low bandwidth connections and which may be processed with limited processor resources. Moreover, it may be useful to provide a format for video data which allows for levels of video resolution depending on available technology for processing. Additionally, it may be useful to provide a method of processing video data which allows for processing with limited memory resources.

.Djtαm< DbbiKiriiUN ut I HH UKA WINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which like reference numerals refer to similar elements and in which:

FIG. 1 illustrates an embodiment of a process of encoding still images using JPEG compression.

FIG. 2 illustrates an embodiment of a process of encoding video data based on Motion JPEG encoding.

FIG. 3 illustrates an embodiment of a process of encoding video data based on MPEG standards.

FIG. 4 illustrates an embodiment of a process of encoding video data.

FIG. 5 illustrates an embodiment of a process of encoding video data using delta motion JPEG.

FIG. 6 illustrates another embodiment of a process of encoding video data using delta motion JPEG.

FIG. 7 illustrates frames used in constructing a reference frame in one embodiment.

FIG. 8 illustrates an embodiment of a process of decoding a frame.

FIG. 9 illustrates transformations of a set of blocks in embodiments of various encoding systems.

FIG. 10 illustrates relationships between buffers in an embodiment of a turbo motion JPEG system.

FIG. 11 illustrates media which may serve as repositories for the buffers of FIG. 10 in an embodiment.

FIG. 12 illustrates an embodiment of a personal device.

FIG. 13 illustrates an embodiment of memory used in an encoding process.

FIG. 14 illustrates an embodiment of a process of encoding.

FIG. 15 illustrates an embodiment of a network.

FIG. 16 illustrates an embodiment of a conventional computer system.

SUMMARY

A method, apparatus and system for turbo motion JPEG is provided. In one embodiment, a method is provided. The method includes receiving a first video frame. The method also includes receiving a second video frame. The method further includes determining a global motion vector of the second video frame relative to the first video frame. Additionally, the method includes encoding the first video frame in JPEG format. Moreover, the method includes encoding the second video frame as a difference between the first video frame and the second video frame after accounting for the global motion vector.

In another embodiment, the invention is a medium embodying instructions. The instructions, when executed by a processor, cause the processor to perform a method. The method includes receiving a first video frame. The method also includes receiving a second video frame. The method further includes determining a global motion vector of the second video frame relative to the first video frame. The method further includes encoding the first video frame in JPEG format. The method also includes encoding the second video frame as a difference between the first video frame and the second video frame after accounting for the global motion vector.

In yet another embodiment, the invention is a system. The system includes a processor and a bus coupled to the processor. A memory coupled to the bus is also included. The processor is to receive a first video frame and receive a second video frame. The processor is also to determine a global motion vector of the second video frame relative to the first video frame. The processor is further to encode the first video frame in JPEG format and encode the second video frame as a difference between the first video frame and the second video frame after accounting for the global motion vector.

In still another embodiment, the invention is a system. The system includes a processor. The system also includes a bus coupled to the processor. The system further includes a frame buffer coupled to the bus, the frame buffer to store a video frame. The system also includes a reference frame buffer coupled to the bus, the reference frame buffer to store a reference video frame.

DBlAlLED DESCRIPTION

A method of encoding video data is provided in one embodiment of the invention. The method includes encoding video data in inter-frames as JPEG format data. The method further includes encoding video data between inter-frames as differences between inter-frame JPEG data. The video data may then be played using motion JPEG players for backward compatibility, or using players capable of decoding all of the data.

The following description sets forth numerous specific details to provide a thorough understanding of the present invention. It will be apparent to one skilled in the art that the present invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures and operations are not shown or described in detail to avoid unnecessarily obscuring aspects of various embodiments of the present invention.

Previously, JPEG has been used for still image encoding, with motion JPEG evolving for video encoding, and various MPEG formats also being produced for video encoding. Each of these formats has various different applications. MPEG tends to be more resource intensive at the decode site, whereas motion JPEG can be more bandwidth intensive.

JPEG is the most popular ISO standard for still digital images. Figure 1 illustrates an embodiment of the process of JPEG compression. An image 110 is converted to YUV (module 120) color coordinates. YUV coordinates are then encoded as three independent color screens.

Each screen is divided into 8x8 blocks. And for each 8x8 block (module 130), a 2- dimensional DCT (module 140) is applied, a quantization process is applied (module 145), and the coefficients are packed using Huffman encoding (module 150). All Huffman encoded data form a resulting data file (160).

Motion JPEG, i.e. MJPEG, is a video format using AVI format (for example) as a carrier, and each video frame is compressed as an individual JPEG file. As shown in Figure 2, it is similar to the embodiment of Figure 1, with an additional loop of video frames.

MPEG is a collection of designated formats for videos. MPEG-I and MPEG-2, developed in different time for different data rates, share a similar architecture as shown in Figure 3. In the baseline profile, every video frame (310) is first classified into two types of frames: I-frames (self-decodable frames or standalone frames) and P-frames (correlated with previous frames). Frames are further divided into 16x16 macroblocks. For each 16x16 macroblock (335), for a P-frame, a motion vector is optionally assigned to it for all three YUV screens referencing to its previous decoded frame (345). In compression, (355, 360 & 365) one may choose to pack the macroblock directly without using the motion vector (creating an intra-

block). One may also choose to subtract the 16x16 block from the decoded previous screens at the place pointed to by the motion vector, and pack the difference of the macroblock (creating an inter-block). There are only intra-blocks for an I-frame, as these frames must not depend on the previous frame.

The macroblock packing is similar to JPEG:

(355)DCT+(360)Quantization+(365)Huffman Coding.

However, a 16x16 macroblock can be decomposed to twelve 8x8 blocks (4 for each of the YUV screens), or eight 8x8 blocks (4 for Y, 2 for each of U and V), or six (4 for Y, 1 for each of U and V), by down-sampling UV screens prior to doing DCT. The motion vector may also be Huffman encoded in its x and y components.

The complexity of MPEG vs Motion JPEG is a) first in the computational load of finding a motion vector (340), and b) second in the memory usage: the previous frame needs to be decoded for reference purposes (345).

Turbo Motion JPEG (TMJPEG) is projected to reduce or eliminate the computational load by selecting only one global motion vector. In practice, particularly for mobile devices, most compression performance gain of MPEG versus JPEG is from some large scale motion. In TMJPEG, two types of frames are provided. The I-frames are coded in the same manner as in Motion JPEG, and the P-frames are processed as illustrated in Figure 4.

First, the global motion vector is determined using a motion estimation algorithm (Step 435) (which will be discussed further below). Based on empirical evidence, the motion vector tends to be within the range (-8,8)x(-8,8). The inverse map is applied to the reference frame as shown in Figure 7, which moves part of the image region (721) out of picture and moves part of undefined region (722) into the scene. It can be useful to pad the undefined region from edge pixels.

To reduce the reference memory requirement, if the global motion estimation requires no full reference frame, one may decode the reference on the fly by keeping an image stripe of height 16, i.e. two rows of 8x8 blocks, as illustrated in the dark region of Figure 8. This image stripe provides most of the image data which will be needed under most circumstances for encoding and decoding.

There are 4 cases for a motion vector: (a) (-8,0)x(-8,0), (b) [0,8)x(-8,0), (c) (-8,0)x[0,8), and (d) [0,8)x[0,8), as shown in Figure 8. In case (a) we may decode 8x8 blocks of the reference frame at the same pace as we encode 8x8 blocks of the current reference. However, for the leftmost and top row 8x8 blocks some padding may be required. In case (b) and case (d), for

each row, one must decode one more 8x8 block from the left in advance, and then pad to the right for the last one. Furthermore, in cases (c) and (d), the process involves decoding one more row of 8x8 blocks in advance and padding to the bottom for the last row.

The dark region of 16 lines of frame width in Figure 8 shows the active reference frame in the memory buffer at the time of encoding the highlighted 8x8 block.

For each 8x8 block, the choice is to pack it directly as with Motion JPEG, or subtract the reference and performing DCT (module 455). A bit may be included to indicate which method should be used. However, instead of sending a bit to tell which type it is, one may use an automatic criterion:

When unpacking the code stream for an 8x8 block, if in the first five AC parameters of the Y screen three are non-zero, then it is assumed to be an intra-block (i.e. no difference), otherwise an inter-block (i.e. in difference from reference frame).

In rare cases, the criterion are not achieved, and the choice must be forced. The whole frame can be made an intra-frame. Alternatively, and potentially better some inter-block coefficients may be forced to zero. This automatic block type classification method potentially allows for performance superiority of TMJPEG.

In some application, it is more convenient to set all global motion vectors to zero all the time. The advantage of doing so is that one may skip all parts of a process involving motion vectors. Such a format may be called Delta Motion JPEG.

The difference DCTs for Delta Motion JPEG can be much simpler, for all 8x8 block boundaries are aligned. Therefore, the difference DCTs are the same as the DCT differences due to linearity. As shown in Figure 5, the compression process is simpler than the standard one in Figure 4. There is no need to do any motion detections, and there is no need to fully decode the video frame. Also, as shown in Figure 5, one may update the reference DCTs instead of block pixel values.

Furthermore, if the quantization factor of the current frame is a multiple of the previous frame, one needs only to update the quantized value. Table 1 is the standard quantization table in JPEG, and Table 2 is the sample quantization tables that used in one embodiment, where

A = T, B = 2 b , C = 2°, D = 2 d , and£ = 2 e , with a ≤ b ≤ c ≤ d ≤ e. In particular, when one chooses the sequence 4, 5, 6, 6, and 7, i.e.

A = 2 4 = 16, B = 2 5 = 32, C = 2 6 = 64, D = 2 6 = 64, andE = 2 7 = 128, the resulting quantization is similar to the quantization offered by the standard.

Table 1. Canonical Quantization for JPEG

Y screen UV screens

Table 2. Quantization for TMJPEG

Y screen UV screens

This collection of Hierarchical quantization tables, not only provides a good balance of various bit rates, but also simplifies the quantization process in hardware design.

The following illustrates an example of an implementation with lower memory requirements. In one example, the total memory is M in bits (e.g. 2OK bytes = 160000 bits), N is the number of macroblocks for a video frame (e.g. 396 for CIF). If one is coding the n-th macroblock of the current frame, one may follow the following process.

ROO represents the pointer to the reference memory where the n-th macroblock of the old reference frame begins, and RlO represents the pointer to the same reference memory where the (n-l)-th macroblock of the new reference frame ends. Thus RlO < R00. The encoding process for the n-th macroblock of the current frame can be described as the following:

(a) Decoding the n-th macroblock of the old reference frame from ROO to ROl.

(b) Encoding the n-th macroblock of the current frame and pass the coded data to memory for output

(c) Encoding also the n-th macroblock of the current frame in intra mode for future referencing:

1) If the coded size S is less to ROl-RlO, store it in the reference memory with a stalling bit 0 as a reference block. While incrementing n, and also reassigning memory pointers: ROO = ROl and RlO = RlO+S+1.

2) If the coded size S is too large, skip the block by sending a skip signal: a bit of the number 1. While incrementing n, one has ROO = ROl and RlO = Rl 0+1.

Instead of sending a bit to indicate whether to skip or not, one may mix the skip code with other intra code bits for slightly better performance. In this case, the condition of (c)(l) may be generalized to S < ROl - RlO - bit_number_of_skip_signal.

Further illustration and description of various embodiments in the figures may provide additional insights into potential implementations. Figure 1 illustrates an embodiment of a process of converting a digital image to a JPEG file. Process 100 includes receiving a digital image, performing a YUV conversion, converting each sub-block of the image using discrete co- sign transform quantization and Huffman encoding, and thereby producing JPEG data.

Process 100 begins at module 110 in which digital image is received. At module 120 YUV conversion occurs such as converting from YUV video data to RGB digital data for example. At module 130 the process is begun for 8x8 block or other sub-block of the digital data. At module 140 a discrete co-sign transform is performed on the current block. At module 145 quantization of the transformed block occurs and Huffman encoding then occurs at module 150. If another block remains to be processed, the process returns to module 140 for discrete co-

sign transformation of that block. If all sub-blocks have been processed then at module 160 JPEG data is provided as an output.

While JPEG is well known for still video or still images, motion pictures or motion video may also be encoded using something similar to standard JPEG. Figure 2 illustrates an embodiment of a process of converting digital video data into data represented using a motion JPEG format. Process 200 includes receiving video data, processing each video frame through the use of YUV conversion, discrete co-sign transform quantization and Huffman encoding to produce motion JPEG data.

Process 200 begins at module 210 with the receipt of digital video data. At module 220, an individual video frame is selected. At module 230 YUV conversion occurs, converting from video data to more traditional video image data. A discrete co-sign transform is performed for a sub-block at module 240. That sub-block is then quantized at module 245 and Huffman encoded at module 250.

If another sub-block remains to be transformed, the process returns to module 240 for discrete co-sign transformation of the next block. If the last sub-block of the video frame has been transformed, then the process is completed for that video frame. This determination occurs at module 260. If the video frame is completed, at module 270 a determination is made as to whether another video frame is present in the video image data sequence. If so, the process moves to module 230 for YUV conversion of the video frame, data of the next video frame and then as with previous video frame. If the last video frame has been converted then at module 280, motion JPEG data is produced. This is the data that results from this process and may be assembled into a file for example. Note that as depicted, modules 260 and 270 may be arranged at the beginning or end of the associated loops.

While motion JPEG is a useful format for storing and transferring video data, it effectively stores every video frame, which may or may not be useful in some instances. Figure 3 illustrates an embodiment of a process of MPEG encoding. Process 300 includes receiving digital video, separating the digital video into frames for each video frame performing YUV conversion and then, for each sub-block in the frame, detecting a motion vector using a reference frame determining whether a delta frame or a new frame is necessary, performing a discrete cosine transform either as a delta or an entire discrete cosine transform, performing quantization, performing Huffman encoding, and decoding a resulting reference frame before producing MPEG data.

At module 310, digital video data is received. At module 320 the digital video data is separated out into frames. At module 330 YUV conversion is performed on the present or first

frame of those remaining to be converted. At module 335 the frame is divided into 16x16 blocks or similar sub-blocks. At module 340 a motion vector is detected between a current sub block and a sub block in a corresponding previous video frame. This detection occurs through use of reference frame 345, which is the previous video frame.

At module 350 a determination is made as to whether the sub block should be encoded as a difference sub block or a new sub block. At module 355 the sub block is transformed through discrete cosine transformation. Depending on the encoding type as to whether it is a difference sub block or anew sub block, this may either be a discrete cosine transform of the differences between the current sub block and the reference sub block from the reference frame or it may be a discrete cosine transform of the actual sub block. The transformed sub block at module 360 is quantized and at module 365 undergoes Huffman encoding.

At module 370 a determination is made as to whether this is the last sub block to be processed or not. If it is not, the process moves back to motion vector detection at module 330. If the entire video frame has been processed, meaning that the last sub block has been processed, then at module 380 the frame is decoded to provide a reference for the next frame to be encoded and this frame is then provided as reference frame 345. At module 390 an MPEG P-frame is produced. This P-frame is an intermediate frame in the MPEG process and may be used as part of a string of frames, provided that each may refer to the previous frame as a reference frame.

One may expect that this process would be repeated for each video frame encoded out of the digital video data or separated out of the digital video data. Moreover, note that in the instance when an I-frame is provided, such an I frame would be produced not through the MPEG P frame process of process 300 but through the motion JPEG process of process 200 or some similar process producing a fully encoded frame rather than a frame with differences from a reference frame.

Digital video is received at module 410. Digital video is then separated into video frames at module 420 and these video frames are converted from YUV format to an RGB format at module 430 for example. A global motion vector for the frame is then detected at module 435. Moreover at module 440, an inverse motion vector corresponding to the global motion vector is applied to the frame to produce reference frame 445. At module 450 the current frame is divided into sub blocks, typically 8x8 pixel.

Depending on the level of motion within the sub block relative to motion in the overall frame, either a discrete cosine transform or a differential discrete cosine transform is performed at module 455 for a given sub block. At module 460 the sub block is then quantized and at module 465 Huffman encoding of the sub block occurs. At module 470 a determination is made

as to wnetner additional sub blocks remain to be processed for the given video frame. At module 480 a reference sub block for the frame is decoded and provided back to reference frame 445.

If further sub blocks need to be processed the process returns to module 450 for the next sub block. If sub blocks to not need to be processed then at module 490, the P-frame of the turbo motion JPEG process is produced and if additional video frames are present then the process returns to 430 for YUV conversion of the next video frame. If no additional video frames are present, the process terminates.

Another option for converting video to transmissible video files or image representations is delta motion JPEG. Figure 5 illustrates another embodiment of the process of converting digital video to encoded video data. Process 500 includes receiving digital video, breaking the digital video data into frames, converting the video frames, processing sub blocks of the video frames, updating a reference frame related to processing of the video frames, and providing the resulting video frames as an output.

At module 510 digital video is received as data. At module 520 the digital video is broken into a series of video frames. At module 530 a first video frame or a current video frame is converted from a YUV representation to another conventional representation. At module 540 processing of the frame occurs in a block-by-block fashion with the first block acted on first. At module 545 a discrete cosine transform of the first sub block of the video frame occurs. At module 550 a differential discrete cosine transform of the sub block occurs using reference block 535.

Reference block 535 is an already-transformed reference of the previous video frame, for example it is a video frame which has already undergone a discrete cosine transform transformation. At module 560 quantization of differential discrete cosine transform occurs. At module 565 Huffman encoding of the quantized output occurs. At module 570 a reference block is updated with the current block which has just been processed and placed in the space occupied by the previous block which was used for comparison purposes. At module 580 a determination is made as to whether additional blocks need to be processed. If so, the process returns to module 540. If no additional blocks need to be processed, the module provides the P frame or difference frame for the corresponding video frame and then may return to module 520 or module 530 for the next video frame and conversion thereof.

Using the already-transformed reference frame allows for a much smaller area to be consumed by the reference frame. As a result, the memory intensity of the process or the memory resources required by the process are reduced. While the reference frame in process 500 has already undergone discrete cosine transformation, other variations may also be available.

In some processes the reference frame may be a quantized reference frame rather than simply a transformed reference frame for example. Figure 6 illustrates another alternate embodiment of a process of transforming digital video data into a transmissible digital video file or stream. Process 600 includes receiving digital video data, segregating the digital video data into frames, converting the data, transforming blocks of the data, quantizing those blocks, performing a differential quantization, and Huffman encoding the resulting blocks to provide an output frame and also updating a reference for the next frame in the series.

Digital video data is received at module 610. The digital video data is then broken into video frames at module 620. At module 630 a current or first video frame is converted from whatever video input it is provided into a conventional video input for purposes of transmission. At module 635 blocks of the video frame as converted are provided for processing. At module 640 discrete cosine transform of a current block occurs. At module 645 the transformed block is quantized. At module 650 the differential between the quantized block and a reference quantized transformed block is taken, thus providing a relatively small amount of data indicating only the differences between the quantized transformed block and the previous frame at that point. At module 660 the differential quantized transformed block is Huffman encoded. At module 670 the reference quantized transformed block is updated with the most recently processed sub block, thus replacing in the reference block the data of the most recent completed frame with the data of the frame being processed for that block. At module 680 a determination is made as to whether additional blocks need to be processed within the video frame. If additional blocks need to be processed, the process returns to module 640 for discrete cosine transformation of the next block. If no additional blocks need to be processed, the P-frame or output frame is provided at module 690.

A graphical illustration of transformation of frames may be useful but a conceptual illustration of the changes in frames may also be useful. Figure 9 illustrates changes in frames or sub blocks of frames under a variety of different processes. JPEG process illustrates a set of still frames or a sequence of still frames where frame A is encoded, frame B is encoded, frame C is encoded, frame D is encoded, and frame E is finally encoded. Using motion JPEG, JPEG is used to encode a series of frames as still images which may be used as a motion video or motion picture display. As can be seen, motion JPEG encodes each still image in its entirety, thus the content of frame A is encoded as is the contents of frames B, C, D, and E, respectively.

Turbo motion JPEG takes a different approach. Frame A is a still frame as encoded and is provided as a reference frame. Frame B is encoded as a difference with frame A after taking into account a global motion vector. Similarly, frame C is encoded as a differential from a

reference frame B at this point, including another global motion vector for motion of the overall frame from frame B to frame C. As one may expect, frame D is then encoded as a differential from frame C after taking into account the global motion vector for frames D and C. Frame E, for this illustration, is encoded as an I- frame; that is, a frame which is completely encoded and which may be taken to be a reset of the encoder in that no reference frame is needed.

Also illustrated is delta JPEG which takes an approach similar to turbo motion JPEG but not identical. Frame A as the initial frame is encoded entirely. Frame B is then encoded as a differential from frame A. Similarly, frame C is encoded as a differential from frame B. Frame D is encoded as a differential from frame C. Presumably frame E is different enough that differential encoding is not useful and therefore it is encoded as an I- frame. Note that for delta JPEG encoding, what the reference frame is in the process may vary. It may be a transformed frame or something that has undergone discrete cosine transformation or it may be a transformed and quantized frame, thus having also undergone quantization, for example.

Another illustration of how memory is utilized may also further eliminate various encoding processes. Figure 10 illustrates utilization of memory in one embodiment. JPEG frame 1010 represents a first frame to be encoded. Reference frame 1030 is then JPEG frame 1010, either in an encoded form or in an unencoded form. Transmit frame 1020 is similarly a representation of JPEG frame 1010, however it is the actually transmitted frame. JPEG frame 1040 is a succeeding frame which is to be encoded as a differential relative to JPEG frame 1010. JPEG frame 1040 is then compared to reference frame 1030. The resulting comparison results in transmit frame 1050. Transmit frame 1050 may be expected to have the same differential with transmit frame 1020 which JPEG frame 1040 has with JPEG frame 1010. JPEG frame 1040 also produces reference frame 1060. Reference frame 1060 again may have the exact or essentially the same differential with reference frame 1030 which JPEG frame 1040 has with JPEG frame 1010.

In some embodiments, reference frame 1060 is not actually produced as an entire frame. Rather the sub blocks of reference frame 1060 overwrite the sub blocks of reference frame 1030, thus requiring only storage for a single reference frame 1030. When a next JPEG frame arrives, that JPEG frame can be compared with the new reference frame 1030 which has the contents of reference frame 1060 in the same space.

This may further be illustrated with reference to Figure 11. Figure 11 illustrates an embodiment of memory which may be used to store video data in various forms. Memory 1100 includes various buffers. Frame buffer 1110 is the buffer dedicated to the next frame to be processed. Reference DCT buffer at 1120 is the reference buffer for the frame against which

frame buffer 1110 with be compared. Reference DCT buffer 1120 thus includes an encoding of the previous frame for comparison purposes. Transmit buffer 1130 includes the frame to be transmitted next and thus either includes the frame just encoded awaiting transmission or an encoding of the frame of frame buffer 1110 as that frame is encoded. As frame buffer 1110 is encoded or the contents thereof are encoded, reference DCT buffer 1120 is overwritten with a reference DCT representation of the contents of frame buffer 1110, thus allowing for encoding of the next frame which may be expected to be written into frame buffer 1110 using the same reference DCT buffer 1120.

Figure 12 illustrates an embodiment of a device which may be used in conjunction with the various methods and apparatuses described previously. Figure 12 illustrates an embodiment of a personal device such as a personal digital assistant or similar type of device which one may expect to use in a handheld manner. Moreover, the embodiment of Figure 12 may also represent a digital camera or similar personal device. Device 1200 includes a processor memory and storage, a bus, communications interface, input/output control, image control and associated imaging put in image display.

Processor 1210 may be a microprocessor or digital signal processor for example. Processor 1210 is coupled or connected to bus 1270. Bus 1270 is coupled or connected to memory 1240, nonvolatile storage 1250, input/output control 1260, and image control 1230. Processor 1210 may also be coupled or connected to a communications interface 1220 illustrated as separate from bus 1270 but which may also be connected to or coupled to bus 1270. Network Communications interface 1220 may be an interface useful for communication with other machines for example. Image control 1230 may be used to control image input and output for example. Digital image input 1265 may be an image input such as a lens for a camera, for example, and may through image control 1230 transmit data to display 1235 for display to a user such as in a camera or camera phone, for example. Memory 1240 may be memory on-board, for example volatile memory such as RAM which is available when the computer or device is powered on but is otherwise not used. Nonvolatile storage 1250 may, for example, be FLASH memory which may be nonvolatile in nature in that absence of power may not affect its memory storage capabilities. Moreover, nonvolatile storage 1250 may include some form of memory such as a PROM or ROM which is not writeable by the device 1200. I/O control 1260 or input/output control 1260 may be used to control user input/output interface 1255. User input/output interface 1255 for example may have keys or a speaker associated with it along with a display such as display 1235. Alternatively, user communications interface 1295 may include

a speaker and microphone, while user I/O interface 1255 may include a keypad and screen, for example.

Within various devices, memory management of video frames may be implemented in an attempt to minimize the amount of memory required. Fig. 13 illustrates an embodiment of memory as may be used in a device. Input frame 1300 represents an input video frame which is to be processed. Encoder 1320 is an encoding module. Encoder 1320 produces compressed frame data 1330, using reference frame 1350. Thus, compressed frame data 1330 may represent the current frame as it is encoded. Similarly, reference frame 1350 may represent the most recently encoded frame. Decoder 1340 may decode frame data, and may be used to decode l compressed frame data 1330 to produce reference frame 1350. As may be expected, encoding and decoding may occur on a frame-by-frame basis or on a sub-block-by-sub-block basis, for example.

Various processes may be used to utilize memory. Fig. 14 illustrates an embodiment of a process of encoding data and managing memory. Process 1400 includes modules related to decoding a reference macroblock, encoding a current macroblock, and transferring the encoded macroblock into reference memory. At module 1410, the macroblock of the reference frame which is currently being processed is decoded. At module 1420, the corresponding macroblock of the current frame is encoded, using the decoded reference nanoblock. At module 1430, the newly encoded macroblock is copied into the reference frame, providing it for use with the next frame to be encoded. As a result, the amount of memory needed for each macroblock is limited to the memory for one macroblock - the various macroblocks need not be maintained on an ongoing basis.

The following description of FIGS. 15-16 is intended to provide an overview of computer hardware and other operating components suitable for performing the methods of the invention described above and hereafter, but is not intended to limit the applicable environments. Similarly, the computer hardware and other operating components may be suitable as part of the apparatuses of the invention described above. The invention can be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.

Fig. 15 shows several computer systems that are coupled together through a network 1505, such as the Internet. The term "Internet" as used herein refers to a network of networks

which uses certain protocols, such as the TCP/IP protocol, and possibly other protocols such as the hypertext transfer protocol (HTTP) for hypertext markup language (HTML) documents that make up the World Wide Web (web). The physical connections of the Internet and the protocols and communication procedures of the Internet are well known to those of skill in the art.

Access to the Internet 1505 is typically provided by Internet service providers (ISP), such as the ISPs 1510 and 1515. Users on client systems, such as client computer systems 1530, 1540, 1550, and 1560 obtain access to the Internet through the Internet service providers, such as ISPs 1510 and 1515. Access to the Internet allows users of the client computer systems to exchange information, receive and send e-mails, and view documents, such as documents which have been prepared in the HTML format. These documents are often provided by web servers, such as web server 1520 which is considered to be "on" the Internet. Often these web servers are provided by the ISPs, such as ISP 1510, although a computer system can be set up and connected to the Internet without that system also being an ISP.

The web server 1520 is typically at least one computer system which operates as a server computer system and is configured to operate with the protocols of the World Wide Web and is coupled to the Internet. Optionally, the web server 1520 can be part of an ISP which provides access to the Internet for client systems. The web server 1520 is shown coupled to the server computer system 1525 which itself is coupled to web content 1595, which can be considered a form of a media database. While two computer systems 1520 and 1525 are shown in Fig. 15, the web server system 1520 and the server computer system 1525 can be one computer system having different software components providing the web server functionality and the server functionality provided by the server computer system 1525 which will be described further below.

Client computer systems 1530, 1540, 1550, and 1560 can each, with the appropriate web browsing software, view HTML pages provided by the web server 1520. The ISP 1510 provides Internet connectivity to the client computer system 1530 through the modem interface 1535 which can be considered part of the client computer system 1530. The client computer system can be a personal computer system, a network computer, a Web TV system, or other such computer system.

Similarly, the ISP 1515 provides Internet connectivity for client systems 1540, 1550, and 1560, although as shown in Fig. 15, the connections are not the same for these three computer systems. Client computer system 1540 is coupled through a modem interface 1545 while client computer systems 1550 and 1560 are part of a LAN. While Fig. 15 shows the interfaces 1535 and 1545 as generically as a "modem," each of these interfaces can be an analog modem, ISDN

modem, cable modem, satellite transmission interface (e.g. "Direct PC"), or other interfaces for coupling a computer system to other computer systems.

Client computer systems 1550 and 1560 are coupled to a LAN 1570 through network interfaces 1555 and 1565, which can be Ethernet network or other network interfaces. The LAN 1570 is also coupled to a gateway computer system 1575 which can provide firewall and other Internet related services for the local area network. This gateway computer system 1575 is coupled to the ISP 1515 to provide Internet connectivity to the client computer systems 1550 and 1560. The gateway computer system 1575 can be a conventional server computer system. Also, the web server system 1520 can be a conventional server computer system.

Alternatively, a server computer system 1580 can be directly coupled to the LAN 1570 through a network interface 1585 to provide files 1590 and other services to the clients 1550, 1560, without the need to connect to the Internet through the gateway system 1575.

Fig. 16 shows one example of a conventional computer system that can be used as a client computer system or a server computer system or as a web server system. Such a computer system can be used to perform many of the functions of an Internet service provider, such as ISP 1510. The computer system 1600 interfaces to external systems through the modem or network interface 1620. It will be appreciated that the modem or network interface 1620 can be considered to be part of the computer system 1600. This interface 1620 can be an analog modem, ISDN modem, cable modem, token ring interface, satellite transmission interface (e.g. "Direct PC"), or other interfaces for coupling a computer system to other computer systems.

The computer system 1600 includes a processor 1610, which can be a conventional microprocessor such as an Intel Pentium microprocessor or Motorola Power PC microprocessor. Memory 1640 is coupled to the processor 1610 by a bus 1670. Memory 1640 can be dynamic random access memory (DRAM) and can also include static RAM (SRAM). The bus 1670 couples the processor 1610 to the memory 1640, also to non- volatile storage 1650, to display controller 1630, and to the input/output (I/O) controller 1660.

The display controller 1630 controls in the conventional manner a display on a display device 1635 which can be a cathode ray tube (CRT) or liquid crystal display (LCD). The input/output devices 1655 can include a keyboard, disk drives, printers, a scanner, and other input and output devices, including a mouse or other pointing device. The display controller 1630 and the I/O controller 1660 can be implemented with conventional well known technology. A digital image input device 1665 can be a digital camera which is coupled to an I/O controller 1660 in order to allow images from the digital camera to be input into the computer system 1600.

The non- volatile storage 1650 is often a magnetic hard disk, an optical disk, or another form of storage for large amounts of data. Some of this data is often written, by a direct memory access process, into memory 1640 during execution of software in the computer system 1600. One of skill in the art will immediately recognize that the terms "machine-readable medium" or "computer-readable medium" includes any type of storage device that is accessible by the processor 1610 and also encompasses a carrier wave that encodes a data signal.

The computer system 1600 is one example of many possible computer systems which have different architectures. For example, personal computers based on an Intel microprocessor often have multiple buses, one of which can be an input/output (I/O) bus for the peripherals and one that directly connects the processor 1610 and the memory 1640 (often referred to as a memory bus). The buses are connected together through bridge components that perform any necessary translation due to differing bus protocols.

Network computers are another type of computer system that can be used with the present invention. Network computers do not usually include a hard disk or other mass storage, and the executable programs are loaded from a network connection into the memory 1640 for execution by the processor 1610. A Web TV system, which is known in the art, is also considered to be a computer system according to the present invention, but it may lack some of the features shown in Fig. 16, such as certain input or output devices. A typical computer system will usually include at least a processor, memory, and a bus coupling the memory to the processor.

In addition, the computer system 1600 is controlled by operating system software which includes a file management system, such as a disk operating system, which is part of the operating system software. One example of an operating system software with its associated file management system software is the family of operating systems known as Windows® from Microsoft Corporation of Redmond, Washington, and their associated file management systems. Another example of an operating system software with its associated file management system software is the LINUX operating system and its associated file management system. The file management system is typically stored in the non- volatile storage 1650 and causes the processor 1610 to execute the various acts required by the operating system to input and output data and to store data in memory, including storing files on the non-volatile storage 1650.

Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art.

An aigoπtJtim is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

The present invention, in some embodiments, also relates to apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.

The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from other portions of this description. In addition, the present invention is not described with reference to any particular programming language, and various embodiments may thus be implemented using a variety of programming languages.

While specific embodiments of the invention have been illustrated and described herein, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.