Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS OF SUPPORTING A VIDEO PROTOCOL IN A NETWORK ENVIRONMENT
Document Type and Number:
WIPO Patent Application WO/1999/055013
Kind Code:
A2
Abstract:
A method and apparatus of supporting a video protocol in a network environment. Video processing and hardware requirements associated with a receiver are minimized by specifying a single video protocol for transmission of video data between transmitters and receivers on a network. The protocol specifies a color format that allows for high video quality and minimizes the complexity of the receiver. Transmitters convert video data into the designated protocol as needed. The designated protocol specifies a color format including a luminance value and two chrominance values. Quantized differential coding is applied to the luminance value and subsampling is performed on the chrominance values to reduce transmission bandwidth requirements. Upscaling of video data is performed at the receiver, whereas downscaling is performed at the transmitter. Various display sizes can thus be accommodated with efficient use of network bandwidth.

Inventors:
RUBERG ALAN T
HANKO JAMES G
NORTHCUTT J DUANE
WALL GERARD A
Application Number:
PCT/US1999/008673
Publication Date:
October 28, 1999
Filing Date:
April 20, 1999
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SUN MICROSYSTEMS INC (US)
International Classes:
H04L29/06; H04N7/26; H04N7/30; H04N7/46; H04N9/64; H04N11/04; H04N19/593; H04N5/44; H04N9/67; (IPC1-7): H04B/
Foreign References:
EP0805592A21997-11-05
US5550847A1996-08-27
EP0725541A21996-08-07
EP0661881A21995-07-05
US5515296A1996-05-07
Other References:
KING IP CHAN ET AL: "Block shuffling on top of error concealment for wireless image transmissions" SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS. PIMRC'96 (CAT. NO.96TH8240), PROCEEDINGS OF PIMRC '96 - 7TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE COMMUNICATIONS, TAIPEI, TAIWAN, 15-18 OCT. 1, pages 977-981 vol.3, XP002131882 1996, New York, NY, USA, IEEE, USA ISBN: 0-7803-3692-5
RASSE Y: "ASN H.261 SINGLE-CHIP LOW BIT RATE VIDEO-CODEC" PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING APPLICATIONS AND TECHNOLOGY,XX,XX, vol. 1, 28 September 1993 (1993-09-28), pages 22-31, XP000670777
CANDY J C ET AL: "A CIRCUIT FOR EVALUATING 64KBIT/S ENCODING PROCEDURES" PROCEEDINGS OF THE SPIE,1988, XP000856848
Attorney, Agent or Firm:
Hecker, Gary A. (CA, US)
Download PDF:
Claims:
CLAIMS
1. An apparatus comprising: one or more video sources providing video data in accordance with a plurality of video protocols; a receiver configured to support a single video protocol; a transmitter coupled to said receiver over a network, said transmitter configured to convert said video data from said one or more video sources into converted video data that conforms to said single video protocol.
2. The apparatus of claim 1, wherein: said transmitter is configured to perform downscaling of said converted video data; and said receiver is configured to perform upscaling of said converted video data.
3. The apparatus of claim 1, wherein said transmitter is configured to transmit said converted video data to said receiver in a plurality of packets.
4. The apparatus of claim 3, wherein said receiver is configured to process each of said plurality of packets independently.
5. The apparatus of claim 3, wherein said transmitter is configured to pseudorandomly transmit said plurality of packets.
6. The apparatus of claim 3, wherein each of said plurality of packets comprises a sequence number.
7. The apparatus of claim 6, wherein said receiver is configured to communicate skipped sequence numbers to said transmitter.
8. The apparatus of claim 7, wherein said transmitter is configured to retransmit one or more packets corresponding to said skipped sequence numbers.
9. The apparatus of claim 7, wherein said transmitter is configured to transmit one or more updated packets corresponding to said skipped sequence numbers.
10. The apparatus of claim 3, wherein each of said plurality of packets further comprises one or more coordinate fields specifying one or more coordinates of a first pixel of a block of video data.
11. The apparatus of claim 10, wherein each of said plurality of packets further comprises: a width field specifying the width of a destination rectangle at said receiver; and a height field specifying the height of said destination rectangle.
12. The apparatus of claim 10, wherein each of said plurality of packets further comprises: a width field specifying the width of said block of video data; and a height field specifying the height of said block of video data.
13. The apparatus of claim 3, wherein each of said plurality of packets comprises a luma encoding field specifying a luma encoding scheme from a plurality of luma encoding schemes.
14. The apparatus of claim 3, wherein each of said plurality of packets comprises one or more chroma subsampling fields specifying one or more subsampling values from a plurality of subsampling values.
15. In a computer network, a method for transmitting video data comprising: a transmitter receiving video data from one or more sources; said transmitter converting said video data into a plurality of packets in accordance with a protocol; said transmitter transmitting said plurality of packets to a receiver; and said receiver displaying each of said plurality of packets independently.
16. The method of claim 15, wherein converting said video data further comprises applying quantized differential encoding to a plurality of luma values of said video data.
17. The method of claim 15, wherein converting said video data further comprises subsampling one or more chroma values of said video data.
18. The method of claim 15, wherein converting said video data further comprises downscaling said video data.
19. The method of claim 15, wherein displaying comprises upscaling said video data.
20. The method of claim 15, wherein converting said video data further comprises associating a sequence number with each of said plurality of packets.
21. The method of claim 20, further comprising: said receiver communicating one or more skipped sequence numbers to said transmitter; and said transmitter transmitting one or more packets of video data corresponding to said skipped sequence numbers.
22. The method of claim 15, wherein converting said video data further comprises writing one or more coordinate fields in each of said plurality of packets, said one or more coordinate fields specifying one or more coordinates of a pixel of a video block.
23. The method of claim 15, wherein converting said video data further comprises: writing a width field in each of said plurality of packets, said width field specifying the width of a destination rectangle of said receiver; and writing a height field in each of said plurality of packets, said height field specifying the height of said destination rectangle.
24. The method of claim 15, wherein converting said video data further comprises: writing a width field in each of said plurality of packets, said width field specifying the width of a block of video data stored in a respective packet; and writing a height field in each of said plurality of packets, said height field specifying the height of said block of video data.
25. The method of claim 15, wherein converting said video data further comprises writing a luma encoding field in each of said plurality of packets.
26. The method of claim 15, wherein converting said video data further comprises writing one or more chroma subsample values in each of said plurality of packets.
27. A computer program product comprising: a computer usable medium having computer readable code embodied therein for transmitting video data in a computer network, said computer program product comprising: computer readable code configured to cause a transmitter to perform the steps of: receiving video data from one or more sources; converting said video data into a plurality of packets in accordance with a protocol; and transmitting said plurality of packets to a receiver; computer readable code configured to cause a receiver to display each of said plurality of packets independently.
28. The computer program product of claim 27, wherein converting said video data further comprises applying quantized differential encoding to a plurality of luma values of said video data.
29. The computer program product of claim 27, wherein converting said video data further comprises subsampling one or more chroma values of said video data.
30. The computer program product of claim 27, wherein converting said video data further comprises downscaling said video data.
31. The computer program product of claim 27, further comprising computer readable code configured to cause said receiver to upscale said video data.
32. The computer program product of claim 27, wherein converting said video data further comprises associating a sequence number with each of said plurality of packets.
33. The computer program product of claim 32, further comprising: computer readable code configured to cause said receiver to communicate one or more skipped sequence numbers to said transmitter; and computer readable code configured to cause said transmitter to transmit one or more packets of video data corresponding to said skipped sequence numbers.
34. The computer program product of claim 27, wherein converting said video data further comprises writing one or more coordinate fields in each of said plurality of packets, said one or more coordinate fields specifying one or more coordinates of a pixel of a video block.
35. The computer program product of claim 27, wherein converting said video data further comprises: writing a width field in each of said plurality of packets, said width field specifying the width of a destination rectangle of said receiver; and writing a height field in each of said plurality of packets, said height field specifying the height of said destination rectangle.
36. The computer program product of claim 27, wherein converting said video data further comprises: writing a width field in each of said plurality of packets, said width field specifying the width of a block of video data stored in a respective packet; and writing a height field in each of said plurality of packets, said height field specifying the height of said block of video data.
37. The computer program product of claim 27, wherein converting said video data further comprises writing a luma encoding field in each of said plurality of packets.
38. The computer program product of claim 27, wherein converting said video data further comprises writing one or more chroma subsample values in each of said plurality of packets.
39. An apparatus comprising: means for receiving video data from one or more sources; means for converting said video data into a plurality of packets in accordance with a protocol; means for transmitting said plurality of packets to a receiver; and means at said receiver for displaying each of said plurality of packets independently.
Description:
METHOD AND APPARATUS OF SUPPORTING A VIDEO PROTOCOL IN A NETWORK ENVIRONMENT BACKGROUND OF THE INVENTION 1. FIELD OF THE INVENTION This invention relates to the field of digital video, and, more specifically, to digital video applications in a network environment.

2. BACKGROUND ART Computers and computer networks are used to exchange information in many fields such as media, commerce, and telecommunications, for example. One form of information that is commonly exchanged is video data (or image data), i. e., data representing a digitized image or sequence of images.

A video conferencing feed is an example of telecommunication information which includes video data. Other examples of video data include video streams or files associated with scanned images, digitized television performances, and animation sequences, or portions thereof, as well as other forms of visual information that are displayed on a display device. It is also possible to synthesize video information by artificially rendering video data from two or three-dimensional computer models.

For the purposes of this discussion, the exchange of information between computers on a network occurs between a"transmitter"and a "receiver."In video applications, the information contains video data, and the services provided by the transmitter are associated with the processing

and transmission of the video data. A problem with current network systems is that multiple services provided by one or more transmitters may provide video data using different video protocols. The complexity of the receiver is necessarily increased by the need to accommodate each of the different video protocols. Also, the amount of data associated with video applications is very large. The transmission of such large amounts of data over a network can result in bandwidth utilization concerns. The following description of video technology and an example network scheme are given below to provide a better understanding of the problems involved in transmitting video data over a network.

General Video Technology In digital video technology, a display is comprised of a two dimensional array of picture elements, or"pixels,"which form a viewing plane. Each pixel has associated visual characteristics that determine how a pixel appears to a viewer. These visual characteristics may be limited to the perceived brightness, or"luminance,"for monochrome displays, or the visual characteristics may include color, or"chrominance,"information.

Video data is commonly provided as a set of data values mapped to an array of pixels. The set of data values specify the visual characteristics for those pixels that result in the display of a desired image. A variety of color models exist for representing the visual characteristics of a pixel as one or more data values.

RGB color is a commonly used color model for display systems. RGB color is based on a"color model"system. A color model allows convenient specification of colors within a color range, such as the RGB (red, green, blue)

primary colors. A color model is a specification of a three-dimensional color coordinate system and a three-dimensional subspace or"color space"in the coordinate system within which each displayable color is represented by a point in space. Typically, computer and graphic display systems are three- phosphor systems with a red, green and blue phosphor at each pixel location.

The intensities of the red, green and blue phosphors are varied so that the combination of the three primary colors results in a desired output color.

The RGB color model uses a Cartesian coordinate system. The subspace of interest in this coordinate system is known as the"RGB color cube"and is illustrated in Figure 1. Each corner of the cube represents a color that is theoretically one-hundred percent pure-that is, the color at that location contains only the color specified, and contains no amount of other colors. In the RGB color cube, the corners are defined to be black, white, red, green, blue, magenta, cyan, and yellow. Red, green and blue are the primary colors, black is the absence of color, white is the combination of all colors, and cyan, magenta and yellow are the complements of red, green and blue.

Still referring to Figure 1, the origin of the coordinate system corresponds to the black corner of the color cube. The cube is a unit cube so that the distance between the origin and adjacent corners is 1. The red corner is thus at location, (1,0,0). The axis between the origin (black) and the red corner is referred to as the red axis 110.

The green corner is at location (0,1,0) and the axis 120 between the black origin and the green corner is referred to as the green axis. The blue corner is at location (0,0,1) and the blue axis 130 is the axis between the blue corner and the origin.

Cyan is at corner (0,1,1), magenta is at corner (1,0,1) and yellow is at corner (1,1,0). The corner opposite the origin on the cube's diagonal at location (1,1,1) is the white corner.

A color is defined in the color cube by a vector having red, green and blue components. For example, vector 180 is the resultant of vectors 180R, (along the red axis), vector 180G (along the green axis) and vector 180B (along the blue axis). The end point of vector 180 can be described mathematically by 0.25R + 0.50G + 0.75B. The end of this vector defines a point in color space represented mathematically by the sum of its red, green and blue components.

An example of a system for displaying RGB color is illustrated in Figure 2. A refresh buffer 140, also known as a video RAM, or VRAM, is used to store color information for each pixel on a video display, such as CRT display 160. A DRAM can also be used as buffer 140. The VRAM 140 contains one memory location for each pixel location on the display 160. For example, pixel 190 at screen location XoYo corresponds to memory location 150 in the VRAM 140. The number of bits stored at each memory location for each display pixel varies depending on the amount of color resolution required.

For example, for word processing applications or display of text, two intensity values are acceptable so that only a single bit need be stored at each memory location (since the screen pixel is either"on"or"off"). For color images, however, a plurality of intensities must be definable. For certain high end color graphics applications, it has been found that twenty-four bits per pixel produces acceptable images.

Consider, for example, that in the system of Figure 2, twenty-four bits are stored for each display pixel. At memory location 150, there are then eight bits each for the red, green and blue components of the display pixel. The eight most significant bits of the VRAM memory location could be used to represent the red value, the next eight bits represent the green value and the eight least significant bits represent the blue value. Thus, 256 shades each of red, green and blue can be defined in a twenty-four bit per pixel system.

When displaying the pixel at X0, Y0, the bit values at memory location 150 are provided to video driver 170. The bits corresponding to the R component are provided to the R driver, the bits representing the green component are provided to the G driver, and the bits representing the blue component are provided to the blue driver. These drivers activate the red, green and blue phosphors at the pixel location 190. The bit values for each color, red, green and blue, determine the intensity of that color in the display pixel. By varying the intensities of the red, green and blue components, different colors may be produced at that pixel location.

Color information may be represented by color models other than RGB. One such color space is known as the YUV (or Y'CbCr as specified in ITU. BT-601) color space which is used in the commercial color TV broadcasting system. The YUV color space is a recoding of the RGB color space, and can be mapped into the RGB color cube. The RGB to YUV conversion that performs the mapping is defined by the following matrix equation:

The inverse of the matrix is used for the reverse conversion. The Y axis of the YUV color model represents the luminance of the display pixel, and matches the luminosity response curve for the human eye. U and V are chrominance values. In a monochrome receiver, only the Y value is used.

In a color receiver, all three axes are used to provide display information.

In operation, an image may be recorded with a color camera, which is an RGB system, and converted to YUV for transmission. At the receiver, the YUV information is then retransformed into RGB information to drive the color display.

Many other color models are also used to represent video data. For example, CMY (cyan, magenta, yellow) is a color model based on the complements of the RGB components. There are also a variety of color models, similar to YUV, which specify a luminance value and multiple chrominance values, such as the YIQ color model. Each color model has its own color transformation for converting to a common displayable video format such as RGB. Most transformations may be defined with a transform matrix similar to that of the YIQ color space.

There are many color formats used in the prior art for transmitting image and video data over networks. Some examples of existing color

formats are H. 261 and H. 263, which are used in digital telephony, and MPEG1, MPEG2 and MJPEG. These color formats use compression schemes to reduce the amount of data being transmitted. For example, many color formats use a variation of DCT (discrete cosine transform) compression to perform compression in the frequency domain. A form of variable length Huffman encoding may also be implemented to reduce bandwidth requirements.

Specialized compression/decompression hardware or software is often used to perform the non-trivial conversion of data in these color formats to data for display.

Network Transmission Of Video Data As has been described, there exist a variety of color formats for video data. These variations allow for a large number of different possible video protocols. It becomes problematic for a receiver on a network to handle all possible video protocols that might be used by different transmitters and services acting as video data sources on the network. The problems associated with multiple video protocols are described below with reference to the sample network system illustrated in Figure 3. Figure 3 illustrates a sample network system comprising multiple transmitters 300A-300C for sourcing video data and a single receiver 303. Receiver 303 is equipped with one or more display devices for providing video output associated with received video data.

In the example of Figure 3, transmitters 300A, 300B and 300C, and receiver 303 are coupled together via network 302, which may be, for example, a local area network (LAN). Transmitter 300A transmits video data along network connection 301A to network 302 using video protocol A.

Transmitter 300B transmits video data along network connection 301B to network 302 using video protocol B. Transmitter 300C transmits video data along network connection 301C to network 302 using video protocol C. Thus, receiver 303 may receive video data over network connection 305 from network 302 under any of video protocols A, B or C, as well as any other protocols used by other transmitters connected to network 302, or used by multiple services embodied within one of transmitters 300A-300C.

Receiver 303 may be equipped with different video cards (i. e., specialized hardware for video processing) or software plug-ins to support each video protocol, but this increases the complexity of the receiver, and necessitates hardware or software upgrades when new video protocols are developed. For systems wherein it is a goal to minimize processing and hardware requirements for a receiver, the added complexity of supporting multiple protocols is undesirable.

An issue in all network applications is utilization of network bandwidth. For video applications, bandwidth is an even greater concern due to the large amounts of data involved in transmitting one or more frames of video data. For example, consider a raw workstation video signal of twenty- four bit RGB data, sent in frames of 1280 x 1024 pixels at thirty frames per second. The raw workstation video represents 240 MBps (megabytes per second) of continuous data. Even for smaller frame sizes, video data can represent a significant load on a network, resulting in poor video performance if the required bandwidth cannot be provided. Further, other applications on the network may suffer as bandwidth allocations are reduced to support the video transmission.

SUMMARY OF THE INVENTION A method and apparatus of supporting a video protocol in a network environment is described. In an embodiment of the invention, video processing and hardware requirements associated with a receiver are minimized by specifying a single video protocol for transmission of video data between transmitters and receivers on a network. The protocol specifies a color format that allows for high video quality and minimizes the complexity of the receiver. Transmitters are equipped with transformation mechanisms that provide for conversion of video data into the designated protocol as needed. Compression of the components of the color format is provided to reduce transmission bandwidth requirements.

In one embodiment of the invention, aspects of the designated protocol compensate for problems associated with transmitting video data over a network. The designated protocol specifies a color format including a luminance value and two chrominance values. Quantized differential coding is applied to the luminance value and subsampling is performed on the chrominance values to reduce transmission bandwidth requirements. In one embodiment of the invention, upscaling of video data is performed at the receiver, whereas downscaling is performed at the transmitter. Various display sizes can thus be accommodated with efficient use of network bandwidth.

BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a diagram of an RGB color space.

Figure 2 is a block diagram of a video display apparatus.

Figure 3 is a block diagram of a network system having a single receiver and multiple transmitters.

Figure 4A is a flow diagram illustrating a luma compression scheme in accordance with an embodiment of the invention.

Figure 4B is a flow diagram illustrating a luma decompression scheme in accordance with an embodiment of the invention.

Figure 5 is a diagram illustrating a subsampling and upsampling process in accordance with an embodiment of the invention.

Figure 6A is a flow diagram illustrating a video data transmission process in accordance with an embodiment of the invention.

Figure 6B is a flow diagram illustrating a video data reception process in accordance with an embodiment of the invention.

Figure 7 is a block diagram of a computer execution environment.

Figure 8 is a block diagram of a human interface device computer system.

Figure 9 is a block diagram of an embodiment of a human interface device.

DETAILED DESCRIPTION OF THE INVENTION The invention is a method and apparatus of supporting a video protocol in a network environment. In the following description, numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It will be apparent, however, to one skilled in the art, that the invention may be practiced without these specific details.

In other instances, well known features have not been described in detail so as not to obscure the invention.

Single Video Protocol For Networked Transmissions In an embodiment of the invention, a single video protocol is used for transmission of video data between a transmitter and a receiver. The transmitter of the video data is responsible for supplying video data in accordance with the designated protocol. For example, a transmitter and its internal video services are configured to perform any necessary protocol transformations to bring video data into conformance with the designated protocol before transmission to a receiver. Hardware and processing requirements of the receiver are minimized as only one video protocol need be supported at the receiver.

Though discussed in this specification primarily as being applied to video data transmitted one-way from a transmitter to a receiver, video data may also be transmitted from the receiver to the transmitter using the designated protocol. The transmitter may then process the video data in the form of the designated protocol, or the transmitter may convert the video data into another video protocol for further processing.

The designated protocol is chosen to give high video quality to satisfactorily encompass all other video protocols while permitting strategic compression based on knowledge of human perception of luminance and chrominance. The high video quality of the designated protocol ensures that any necessary protocol transformations by a transmitter do not result in a significant loss of video quality from the original video data. An example of a protocol that provides high video quality with compression is a protocol specifying a color format with quantized differential coding of the luminance value and subsampling of chrominance values.

A transmitter may support video applications using the designated protocol, and the transmitter may be configured with mechanisms, such as hardware cards or software plug-ins or drivers, to convert between other video protocols and the designated protocol, for example, using color model matrix transformations.

In an embodiment of the invention, data packets are used to transmit variably sized blocks of video data between a transmitter and a receiver using a connectionless datagram scheme. A connectionless scheme means that each packet of video data, i. e., each video block, is processed as an independent unit, and the loss of a data packet does not affect the processing of other data packets. This independence provides for robust video processing even on unreliable networks where packet loss may be commonplace.

Some networks are prone to periodic packet loss, i. e., packet loss at regular intervals. This periodic behavior can result in the stagnation of

portions of the video display as the same video blocks are repeatedly lost. To prevent video block stagnation, the spatial order in which video blocks are sent to the receiver for display may be pseudo-randomly determined to disrupt any periodicity in packet performance.

In one embodiment, the data packets containing video data are provided with a sequence number. By tracking the sequence numbers, the receiver can note when a sequence number is skipped, indicating that the packet was lost during transmission. The receiver can then return to the transmitter a list or range of sequence numbers identifying the lost packet or packets. When the transmitter receives the list or range of sequences, the transmitter can decide whether to ignore the missed packets, resend the missed packets (such as for still images), or send updated packets (such as for streaming video that may have changed since the packet was lost).

In one embodiment of the invention, the video data packet comprises the following information: Sequence number-A video stream is processed as a series of blocks of video data. The sequence number provides a mechanism for the receiver to tell the transmitter what sequence numbers have been missed (e. g., due to packet loss), so that the transmitter may determine whether to resend, update or ignore the associated video block.

X field-The X field designates the x-coordinate of the receiver's display device wherein the first pixel of the video block is to be displayed.

Y field-The Y field designates the y-coordinate of the receiver's display device wherein the first pixel of the video block is to be displayed.

Width-The width field specifies the width of the destination rectangle on the receiver's display device wherein the video block is to be displayed.

Height-The height field specifies the height of the destination rectangle on the receiver's display device wherein the video block is to be displayed.

Source_w-The source width specifies the width of the video block in pixels. Note that the source width may be smaller than the width of the destination rectangle on the receiver's display device. If this is so, the receiver will upscale the video block horizontally to fill the width of the destination rectangle. The source width should not be larger than the width of the destination rectangle as this implies downscaling, which should be performed by the transmitter for efficiency.

Sourceh-The source height specifies the height of the video block in pixels. Note that, as with source, the source height may be smaller than the height of the destination rectangle on the receiver's display device. As above, the receiver will upscale the video block vertically to fill the height of the destination rectangle. The source height should not be larger than the height of the destination rectangle as this implies downscaling, which should be performed by the transmitter for efficiency.

Luma encoding-The luma encoding field allows the transmitter to designate a particular luma encoding scheme from a set of specified luma encoding schemes.

ChromasubX-This field allows the transmitter to designate the degree of horizontal subsampling performed on the video data chroma values.

ChromasubY-This field allows the transmitter to designate the degree of vertical subsampling performed on the video data chroma values.

Video data-The video data includes (source * source) pixel luma values (Y), and ((sourcew/chromasubx) * (source_h/chroma_sub_y)) signed chroma values (U, V or Cb, Cr).

YUV (Y'CbCr) Color In an embodiment of the invention, the color model of the chosen protocol is specified by the International Telecommunications Union in ITU. BT-601 referring to an international standard for digital coding of television pictures using video data components Y'CbCr, where Y'is a luminance or"luma"value, Cb (or U') is a first chromaticity or"chroma" value represented as a blue color difference proportional to (B'-Y') and Cr (or V') is a second chroma value represented as a red color difference proportional to (R'-Y') (Note that primed values such as Y'indicate a gamma corrected value). This ITU specification is independent of any scanning standard and makes no assumptions regarding the"white"point or CRT gamma. For 0 <= (R, G, B) <= 1, the range for Y'is 0 <= Y'<= 1 and the range for Cb and Cr is-0.5 <= (Cb, Cr) <= 0.5.

The R'G'B'<--> Y'CbCr color transforms are as follows:

Under the specified protocol, the transmitter performs any transformations required to convert the video data into the YUV format.

This may include performing the RGB to YUV matrix conversion shown above to convert RGB data. Transformations may also include decompression from other color formats (e. g., H. 261, MPEG1, etc.). The receiver can drive an RGB display device by performing the above matrix operation to convert incoming YUV (Y'CbCr) data received from a transmitter into RGB data for display at the display rectangle identified in the data packet. No other color transformations are necessary at the receiver.

The receiver is also able to accept RGB data in the same video block format because RGB data is directly supported in the receiver. For transmission efficiency, however, any sizable video data transfers between a transmitter and receiver should be performed in the YUV color format to take advantage of the compression schemes described below.

Luma Compression In each data packet containing a video block, there are (source* source) luma values--one for each pixel. If the luma encoding field indicates that no encoding is being performed, the luma values are unsigned eight-bit values. If, however, luma encoding is indicated in the luma encoding field, the luma values are encoded to achieve a compression ratio of 2: 1. In an embodiment of the invention, the luma value"Y"is compressed using a quantized differential coding (QDC) scheme described below. In other embodiments, other compression schemes may be specified in the luma encoding field.

The luma compression herein described is based on the premise that luma values do not tend to vary significantly from one pixel to another. It is therefore possible to transmit the difference value between luma values for consecutive pixels rather than the luma values themselves. Further, the luma difference values can be satisfactorily quantized to one of sixteen quantization levels, each of which is identified by a four-bit code word. The quantization is non-linear, with more quantization levels near zero where luma differences between consecutive pixels are more likely to occur.

In one embodiment, the luma difference quantization is performed according to the following table: <BR> <BR> <BR> <BR> <BR> <BR> <BR> <BR> Difference Range Code (Binary) Quantized Difference Level<BR> <BR> <BR> <BR> -255 to-91 0 (0000)-100 -90 to-71 1 (0001)-80 -70 to-51 2 (0010)-60 -50 to-31 3 (0011)-40 -30 to-16 4 (0100)-20 -15 to-8 5 (0101)-10 -7 to-3 6 (0110)-4 -2 to 0 7 (0111)-1 1 to 2 8 (1000) 1 3 to 7 9 (1001) 4 8 to 15 10 (1010) 10 16 to 30 11 (1011) 20 31 to 50 12 (1100) 40 51 to 70 13 (1101) 60 71 to 90 14 (1110) 80 91 to 255 15 (1111) 100 Figure 4A is a flow diagram describing how the luma compression process is performed in accordance with an embodiment of the invention.

The scheme is based on a subtraction of a"lastvalue"from the current pixel luma value to generate the luma difference."Lastvalue"is used to model the luma value of the preceding pixel. To prevent divergence of the

compression and decompression processes, the"lastvalue"is modeled to account for the previous quantized luma difference rather than to match the actual luma value of the last pixel. The modeled"lastvalue"in the compression process therefore matches the corresponding modeled "lastvalue"extracted in the decompression process.

Because the compression scheme is based on differences in luma values in rows of luma data, the first luma value in each row has no luma value with which to form a difference. To provide a starting point, in step 400, an initial"lastvalue"is assigned from the middle of the luma range. In step 401, the first luma value in the row of pixels is set as the current luma value. In step 402, the"lastvalue"is subtracted from the current luma value to generate a current luma difference value. The current luma difference is applied to a quantization function, in step 403, that outputs the quantized difference code. In step 404, the difference code is placed in the video block data packet.

In step 405, the quantized difference level corresponding to the difference code is determined, and, in step 406, the"lastvalue"is updated by incrementing by the quantized difference level. In step 407,"lastvalue"is clamped to prevent overflow. The clamping function is: clamp (x) x 0 x<0 255 x > 255 x otherwise In step 408, if there are more pixel luma values in the row, then process flows to step 409 wherein the next luma value is set as the current luma value. After step 409, the process returns to step 402. If there is no

further pixel luma value in the row at step 408, then, in step 410, a determination is made whether there are further rows to process in the video block. If there are further rows to compress, the next row is designated in step 411 and the process returns to step 400. If there are no further rows at step 410, the luma compression is completed for the current video block.

A luma decompression process is illustrated in the flow diagram of Figure 4B. In step 412, the"lastvalue"is set to the same midrange value as is done for the beginning of a row in the compression scheme. In step 413, the first luma difference code is set as the current luma difference code. The quantized difference value is determined from the current luma difference code in step 414. In step 415, the"lastvalue"is incremented by the quantized difference value. In step 416,"last_value"is clamped to prevent overflow.

In step 417,"lastvalue,"now representing the decompressed current luma value, is written to a buffer.

If, in step 418, there are further luma difference codes in the current row of the video block, the next difference code is set as the current luma difference code in step 419, and the process returns to step 414. If, in step 418, there are no further luma difference codes in the current row, the process continues to step 420. In step 420, if there are no further rows in the video block, decompression is complete for the current video block. If, in step 420, there are further rows of luma difference codes, the next row of luma difference codes is set as the current row in step 421, and the process returns to step 412.

Chroma Compression The human eye is less sensitive to chroma information than to luma information, particularly in a spatial sense. For example, if, in generating an image, some of the chroma information is spread beyond the actual edges of an object in the image, the human eye will typically pick up on the edge queues provided by the luma information and overlook the inaccuracies in the spatial location of the chroma information. For this reason, some latitude can be taken with the manner in which chroma information is provided. Specifically, subsampling may be performed without significantly degrading visual quality. Subsampling may consist of sampling a single chroma value from In accordance with an embodiment of the invention, the amount of chroma information, and hence the amount of chroma compression, is specified by the chromasubX and chromasubY fields in the video block data packet. If the values for both of those fields are zero, then there is no chroma information and the video block is monochrome, i. e., luma only.

One possible specification for chroma subsampling is: 0-No chroma values; monochrome image 1-Subsample by one (i. e., no subsampling) 2-Subsample by two 3-Subsample by four Further subsample arrangements may be provided by extending the above specification. ChromasubX and chroma-sub-Y independently specify subsampling along respective axes. Several subsampling arrangements achieved by different combinations of chromasubX and chromasubY, as defined above, are:

chroma sub X chroma sub Y one chroma value per compression 0 0 0, no chroma data-- 0 1 not permitted-- 1 0 not permitted-- 1 1 pixel 1: 1 2 1 1 x 2 pixel array 2: 1 1 2 2 x 1 pixel array 2: 1 3 1 1 x 4 pixel array 4: 1 1 3 4 x 1 pixel array 4: 1 3 2 2 x 4 pixel array 8: 1 2 3 4 x 2 pixel array 8: 1 3 3 4 x 4 pixel array 16: 1 Subsampling may be performed when packing data into a video block data packet by taking data only at the specified intervals in the specified directions. For example, for (chromasubX,chromasubY)= (3,2), chroma data would be taken at every fourth pixel along each row, and every other row would be skipped. Other schemes may be used to select a single pixel from the subsampling matrix, such as pseudo-random assignments.

Further, the chroma values from each pixel in a subsampling matrix may be used to calculate a single set of average chroma values (U, V) for each subsampling matrix.

Subsampling is performed as the video block data packet is being packed and may occur before or after luma compression as luma and chroma compression are substantially independent. When the video block data packet reaches the receiver, the chroma subsamples are upsampled prior to being converted to RGB. Upsampling may be accomplished by taking the subsampled chroma information and duplicating the chroma values for each pixel in the associated subsampling matrix.

Figure 5 illustrates subsampling and upsampling processes carried out on an 8 x 8 array of pixels with subsampling performed in 2 x 4 matrices (chromasubX 2, chroma sub Y = 3). 8 x 8 pixel array 500 represents the original video data prior to subsampling. 4 x 2 pixel array 501 represents the video data after subsampling by the transmitter, and includes the data that would be transmitted to the receiver. 8 x 8 pixel array 502 represents the video data after upsampling at the receiver.

The subsampling matrices are identified in array 500 as those pixel cells having the same index number. For example, all of the pixel cells containing a"1"are in the same subsampling matrix. 4 x 2 array 501 contains the subsampled data from array 500. The chroma values associated with those pixels with index"1"are averaged into chroma average value A1 (A1 comprises an averaged U value and an averaged V value) placed into the first cell of array 501. Similarly, the chroma values for those pixels with index"2" are averaged into chroma average value A2 and placed into the second location in array 501. The other subsampling matrices indexed as"3"-"8"are averaged similarly. The compression ratio seen between array 500 and array 501 is 8: 1.

Array 501 is upsampled into array 502 by placing the averaged chroma values A1-A8 into the positions corresponding to the respective original subsampling matrices. For example, averaged chroma value A1 is placed into each of the pixels in the upper left corner of 502 shown as containing"A1." The insensitivity of the human eye to spatial errors in chroma information allows the averaged chroma values to provide satisfactory viewing results.

Upscaling And Downscaling Of Video Data In an embodiment of the invention, the pixel array size of the source video block may differ from the size of the destination rectangle on the receiver's display. This size variation allows for a receiver with a large display to"blow up"or upscale a small video scene to make better use of the display resources. For example, a receiver may wish to upscale a 640 x 480 video stream to fill a 1024 x 1024 area on a large display device. Also, a receiver may have a smaller display than the size of a video stream. For this case, the video stream should be scaled down to be fully visible on the small display.

In accordance with an embodiment of the invention, upscaling is performed by the receiver, whereas downscaling is performed by the transmitter. One reason for this segregation of scaling duties is that scaled down video data requires lower network bandwidth to transmit. By downscaling video data on its own, the transmitter avoids sending video data that would be later discarded by the receiver. This also permits some simplification of the receiver in that resources, such as software code for downscaling video data, are not needed at the receiver.

Upscaling typically involves duplication of video data. It would be inefficient to send duplicated video data over a network. Therefore, the receiver performs all upscaling operations after receipt of the video data in its smaller form. Upscaling of video data is supported in the fields associated with the video data packet. Specifically, the video protocol provides separate fields for specifying the video source pixel array size and the destination

display rectangle size. The amount of horizontal scaling is (width/sourcew), and the amount of vertical scaling is (height/source_h).

Upscaling is performed after the video data has been decompressed and transformed into RGB format, though in certain embodiments upscaling may precede, or be combined with, the decompression steps. The receiver expands the video data vertically, horizontally or both as need to make the video data fill the designated display rectangle. Expanding video data may be performed as simply as doubling pixel values, but more advanced image filtering techniques may be used to affect re-sampling of the image for better display quality.

Video Data Process Implementing Protocol Figure 6A is a flow diagram illustrating how a transmitter processes video data in accordance with an embodiment of the invention. In step 600, the transmitter acquires video data for transmission to a receiver. The video data may be acquired by any mechanism, such as capture of a video signal using a hardware capture board, generation of video data by a video service, or input of video data from a video input device such as a video camera.

In step 601, if necessary, the video data is decompressed or converted into YUV color format in accordance with the established protocol. In step 602, the transmitter downscales the video data if the transmitter determines that downscaling is needed. The luma values of the YUV video data are compressed, in step 603, using the quantized differential coding (QDC) scheme described herein, and loaded into a data packet. In step 604, the transmitter subsamples the chroma values of the YUV video data and loads

the subsampled chroma values into the data packet. The completed data packet containing the video data for a video block is sent to a receiver in step 605. After transmitting the data packet, the process returns to step 600.

Figure 6B is a flow diagram illustrating how a receiver processes video data in accordance with an embodiment of the invention. In block 606, the receiver receives a compressed/subsampled YUV video block data packet from the transmitter. The receiver decompresses the luma values of the data packet in step 607, and upsamples the chroma values in step 608. With full YUV video data, the receiver performs a color transformation to convert the YUV video data to RGB data. If the destination display rectangle noted in the data packet header is larger than the source video data, the receiver performs any necessary upscaling to fill the designated display rectangle with the source video data. In step 611, the video data is loaded into a video buffer for display on a the receiver's display device. After loading the video data into the video buffer, the process returns to step 606.

Embodiment of Computer Execution Environment (Hardware) An embodiment of the invention can be implemented as computer software in the form of computer readable code executed on a general purpose computers such as computer 700 illustrated in Figure 7, or in the form of bytecode class files executable within a Java runtime environment running on such a computer. A keyboard 710 and mouse 711 are coupled to a bi-directional system bus 718. The keyboard and mouse are for introducing user input to the computer system and communicating that user input to processor 713. Other suitable input devices may be used in addition to, or in place of, the mouse 711 and keyboard 710. I/O (input/output) unit 719

coupled to bi-directional system bus 718 represents such I/O elements as a printer, A/V (audio/video) I/O, etc.

Computer 700 includes a video memory 714, main memory 715 and mass storage 712, all coupled to bi-directional system bus 718 along with keyboard 710, mouse 711 and processor 713. The mass storage 712 may include both fixed and removable media, such as magnetic, optical or magnetic optical storage systems or any other available mass storage technology. Bus 718 may contain, for example, thirty-two address lines for addressing video memory 714 or main memory 715. The system bus 718 also includes, for example, a 32-bit data bus for transferring data between and among the components, such as processor 713, main memory 715, video memory 714 and mass storage 712. Alternatively, multiplex data/address lines may be used instead of separate data and address lines.

In one embodiment of the invention, the processor 713 is a microprocessor manufactured by Motorola, such as the 680X0 processor or a microprocessor manufactured by Intel, such as the 80X86, or Pentium processor, or a SPARCTM microprocessor from Sun MicrosystemsTM, Inc.

However, any other suitable microprocessor or microcomputer may be utilized. Main memory 715 is comprised of dynamic random access memory (DRAM). Video memory 714 is a dual-ported video random access memory.

One port of the video memory 714 is coupled to video amplifier 716. The video amplifier 716 is used to drive the cathode ray tube (CRT) raster monitor 717. Video amplifier 716 is well known in the art and may be implemented by any suitable apparatus. This circuitry converts pixel data stored in video memory 714 to a raster signal suitable for use by monitor 717. Monitor 717 is a type of monitor suitable for displaying graphic images. Alternatively, the

video memory could be used to drive a flat panel or liquid crystal display (LCD), or any other suitable data presentation device.

Computer 700 may also include a communication interface 720 coupled to bus 718. Communication interface 720 provides a two-way data communication coupling via a network link 721 to a local network 722. For example, if communication interface 720 is an integrated services digital network (ISDN) card or a modem, communication interface 720 provides a data communication connection to the corresponding type of telephone line, which comprises part of network link 721. If communication interface 720 is a local area network (LAN) card, communication interface 720 provides a data communication connection via network link 721 to a compatible LAN.

Communication interface 720 could also be a cable modem or wireless interface. In any such implementation, communication interface 720 sends and receives electrical, electromagnetic or optical signals which carry digital data streams representing various types of information.

Network link 721 typically provides data communication through one or more networks to other data devices. For example, network link 721 may provide a connection through local network 722 to local server computer 723 or to data equipment operated by an Internet Service Provider (ISP) 724. ISP 724 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet"725. Local network 722 and Internet 725 both use electrical, electromagnetic or optical signals which carry digital data streams. The signals through the various networks and the signals on network link 721 and through communication interface 720, which carry the digital data to and

from computer 700, are exemplary forms of carrier waves transporting the information.

Computer 700 can send messages and receive data, including program code, through the network (s), network link 721, and communication interface 720. In the Internet example, remote server computer 726 might transmit a requested code for an application program through Internet 725, ISP 724, local network 722 and communication interface 720.

The received code may be executed by processor 713 as it is received, and/or stored in mass storage 712, or other non-volatile storage for later execution. In this manner, computer 700 may obtain application code in the form of a carrier wave.

Application code may be embodied in any form of computer program product. A computer program product comprises a medium configured to store or transport computer readable code or data, or in which computer readable code or data may be embedded. Some examples of computer program products are CD-ROM disks, ROM cards, floppy disks, magnetic tapes, computer hard drives, servers on a network, and carrier waves.

Human Interface Device Computer System The invention has application to computer systems where the data is provided through a network. The network can be a local area network, a wide area network, the internet, world wide web, or any other suitable network configuration. One embodiment of the invention is used in

computer system configuration referred to herein as a human interface device computer system.

In this system the functionality of the system is partitioned between a display and input device, and data sources or services. The display and input device is a human interface device (HID). The partitioning of this system is such that state and computation functions have been removed from the HID and reside on data sources or services. In one embodiment of the invention, one or more services communicate with one or more HIDs through some interconnect fabric, such as a network. An example of such a system is illustrated in Figure 8. Referring to Figure 8, the system consists of computational service providers 800 communicating data through interconnect fabric 801 to HIDs 802.

Computational Service Providers-In the HID system, the computational power and state maintenance is found in the service providers, or services. The services are not tied to a specific computer, but may be distributed over one or more traditional desktop systems such as described in connection with Figure 7, or with traditional servers. One computer may have one or more services, or a service may be implemented by one or more computers. The service provides computation, state, and data to the HIDs and the service is under the control of a common authority or manager. In Figure 8, the services are found on computers 810,811,812,813, and 814. In an embodiment of the invention, any of computers 810-814 could be implemented as a transmitter.

Examples of services include X11/Unix services, archived video services, Windows NT service, JavaTM program execution service, and others.

A service herein is a process that provides output data and responds to user requests and input.

Interconnection Fabric-The interconnection fabric is any of multiple suitable communication paths for carrying data between the services and the HIDs. In one embodiment the interconnect fabric is a local area network implemented as an Ethernet network. Any other local network may also be utilized. The invention also contemplates the use of wide area networks, the internet, the world wide web, and others. The interconnect fabric may be implemented with a physical medium such as a wire or fiber optic cable, or it may be implemented in a wireless environment.

HIDs-The HID is the means by which users access the computational services provided by the services. Figure 8 illustrates HIDs 821,822, and 823.

A HID consists of a display 826, a keyboard 824, mouse 825, and audio speakers 827. The HID includes the electronics need to interface these devices to the interconnection fabric and to transmit to and receive data from the services.

In an embodiment of the invention, an HID is implemented as a receiver.

A block diagram of the HID is illustrated in Figure 9. The components of the HID are coupled internally to a PCI bus 912. A network control block 902 communicates to the interconnect fabric, such as an ethernet, through line 914. An audio codec 903 receives audio data on interface 916 and is coupled to block 902. USB data communication is provided on lines 913 to USB controller 901.

An embedded processor 904 may be, for example, a Sparc2ep with coupled flash memory 905 and DRAM 906. The USB controller 901, network

controller 902 and embedded processor 904 are all coupled to the PCI bus 912.

Also coupled to the PCI 912 is the video controller 909. The video controller 909 may be for example, and ATI RagePro+ frame buffer controller that provides SVGA output on line 915. NTSC data is provided in and out of the video controller through video decoder 910 and video encoder 911 respectively. A smartcard interface 908 may also be coupled to the video controller 909.

The computer systems described above are for purposes of example only. An embodiment of the invention may be implemented in any type of computer system or programming or processing environment.

Thus, a method and apparatus of supporting a video protocol in a network environment have been described in conjunction with one or more specific embodiments. The invention is defined by the claims and their full scope of equivalents.