TRANSMISSION OF VIDEO IMAGES MODIFIED BASED ON STEREOSCOPIC VIDEO IMAGE ACQUISITION

Title:

TRANSMISSION OF VIDEO IMAGES MODIFIED BASED ON STEREOSCOPIC VIDEO IMAGE ACQUISITION

Document Type and Number:

WIPO Patent Application WO/2010/116171

Kind Code:

Abstract:

A method of image transmission comprises acquiring a pair of stereoscopic images from a stereoscopic camera, evaluating the displacement of corresponding image features between the pair of stereoscopic images, processing at least one of the pair of stereoscopic images to render unviewable those image features whose displacement falls outside a previously determined range, and transmitting the or each processed image to a remote viewer. The determined range may be defined relative to a user, and the pair of cameras of the stereoscopic par of cameras may respectively have different overall qualities.

Inventors:

GODAR ANTHONY WILLIAM (GB)

Application Number:

PCT/GB2010/050569

Publication Date:

October 14, 2010

Filing Date:

March 31, 2010

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SONY COMP ENTERTAINMENT EUROPE (GB)
GODAR ANTHONY WILLIAM (GB)

International Classes:

H04N5/225; G06T7/00; H04N5/272; H04N7/14; H04N7/15; H04N13/00

Domestic Patent References:

WO1999051023A1	1999-10-07
WO2007055865A1	2007-05-18
WO1998047294A1	1998-10-22

Foreign References:

US20080298571A1	2008-12-04
JPS63164594A	1988-07-07
KR20070104716A	2007-10-29
US6456737B1	2002-09-24

Other References:

YOKOYA N ET AL: "STEREO VISION BASED VIDEO SEE-THROUGH MIXED REALITY", 1 January 1999, MIXED REALITY. MERGING REAL AND VIRTUAL WORLDS. PROCEEDINGS OFINTERNATIONAL SYMPOSIUM ON MIX REALITY. MERGING REAL AND VIRTUALWORLDS, XX, XX, PAGE(S) 131 - 145, XP001118696

Attorney, Agent or Firm:

TURNER, James (120 Holborn, London EC1N 2DY, GB)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A transmission system, comprising: a stereoscopic camera operable to generate a pair of stereoscopic video images of an environment; an image evaluator operable to evaluate the displacement of corresponding image features between the pair of stereoscopic video images; an image processor operable to modify at least one of the pair of stereoscopic video images to make image features whose displacement falls outside a previously defined displacement range unviewable; and a transmitter operable to transmit the or each modified image to a remote viewer; and in which: the image evaluator is operable to evaluate the displacement of at least a first user depicted in the pair of stereoscopic video images; and the displacement range is defined relative to the displacement of the or each user as depicted in the pair of stereoscopic video images.

2. A transmission system according to claim 1, in which: the stereoscopic camera comprises a pair of cameras angled with respect to each other so that the central lines of sight of their respective fields of view intersect at a finite distance that is determined by the relative angle between the pair of cameras.

3. A transmission system according to claim 2, in which the relative angle between the pair of cameras is adjustable.

4. A transmission system according to claim 2 or claim 3, in which: the image evaluator is operable to evaluate the distance to a first user in the environment as depicted in the pair of stereoscopic video images; and the image evaluator is operable to adjust the relative angle between the pair of cameras so that the central lines of sight of their respective fields of view intersect substantially at the distance to the first user.

5. A transmission system according to claim 1, in which the defined displacement range is subject to predetermined limits.

6. A transmission system according to claim 2, in which the relative angle is arranged to cause the central lines of sight of the pair of cameras to intersect at a distance determined to represent the distance beyond which objects in the environment should be rendered unviewable by the image processor; and a limit of the previously determined displacement range is set to zero.

7. A transmission system according to any one of the preceding claims in which image features whose displacement falls outside the previously defined displacement range are rendered unviewable by one selected from the list consisting of: i. pixellation; ii. low-pass filtering; and iii. replacement by a different image feature.

8. A mobile computing device comprising a transmission system according to any one of claims 1 to 7.

9. A transmission system, comprising: a stereoscopic camera operable to generate a pair of stereoscopic video images of an environment; an image evaluator operable to evaluate the displacement of corresponding image features between the pair of stereoscopic video images; an image processor operable to modify at least one of the pair of stereoscopic video images to make image features whose displacement falls outside a previously defined displacement range unviewable; and a transmitter operable to transmit the or each modified image to a remote viewer; and in which the stereoscopic camera comprises a pair of cameras having a different respective overall quality.

10. A transmission system according to claim 9 in which one of the pair of cameras is of broadcast quality, and the other one of the pair of cameras is not.

11. A method of image transmission comprising the steps of: acquiring a pair of stereoscopic images from a stereoscopic camera; evaluating the displacement of corresponding image features between the pair of stereoscopic images and determining the displacement of at least a first user depicted in the pair of stereoscopic video images; processing at least one of the pair of stereoscopic images to render unviewable those image features whose displacement falls outside a previously determined range; and transmitting the or each processed image to a remote viewer; and in which the displacement range is defined relative to the displacement of the or each user as depicted in the pair of stereoscopic video images.

12. A method according to claim 11, in which the step of acquiring a pair of stereoscopic images from a stereoscopic camera that comprises a pair of cameras, comprises the step of: selecting a relative angle between the pair or cameras so that the central lines of sight of their respective fields of view intersect at a finite distance determined by the relative angle between the pair of cameras, and wherein the relative angle is one selected from the list consisting of: i. a default angle predetermined to result in the intersection of the central lines of sight of the pair of cameras at a distance deemed to represent the distance beyond which objects in the environment should be rendered unviewable; ii. a default angle predetermined to result in the intersection of the central lines of sight of the pair of cameras at a typical user operating distance from the stereoscopic camera; and iii. a user-adjustable angle.

13. A method according to claim 11, in which image features whose displacement falls outside the previously defined displacement range are rendered unviewable by one selected from the list consisting of: i. pixellating the image features; ii. low-pass filtering the image features; and iii. replacing the image features by different image features.

14. A method of image transmission comprising the steps of: generating a pair of stereoscopic video images of an environment using a stereoscopic camera; evaluating the displacement of corresponding image features between the pair of stereoscopic video images; modifying at least one of the pair of stereoscopic video images to make image features whose displacement falls outside a previously defined displacement range unviewable; and transmitting the or each modified image to a remote viewer; and in which the stereoscopic camera comprises a pair of cameras each having a different respective overall quality.

15. A computer program for implementing the steps of any one of claims 11 to 14.

Description:

TRANSMISSION OF VIDEO IMAGES MODIFIED BASED ON STEREOSCOPIC VIDEO IMAGE ACQUISITION

The present invention related to a transmission system and a method of image transmission.

A common form of digital camera is the so-called web-cam. Traditional web-cams are typically low-resolution (e.g. 640 x 480) charge-coupled devices (CCDs) that generate a video output typically at 60 or 50 frames per second, which is then received by a linked video conferencing device, personal computer or similar device that incorporates the video images into a relevant application.

One common application is web-conferencing, encompassing many-to-many, one-to- many and one-to-one modes of communication.

In a web-conferencing application, typically video of the (or each, or the presently talking) user is transmitted to the or each other participant in the web-conferencing session. However, in addition to the user, anything else in the field of view of the web-cam is also transmitted as part of the video image, giving rise to the possibility of embarrassment if unexpected or undesired objects or people are thereby disclosed to the receiving parties.

It is therefore desirable to transmit an image of the user whilst providing privacy in relation to the user's broader environment.

Motion-based segmentation of an image to identify a user has been considered for encoding priority purposes, but it of less value in a situation where the user is likely to be generally stationary.

3DV Services (see http ://www.3 dvsystcms. com/w eb/w cb. html) disclose a camera augmented with an infra-red distance measuring means. The distance measuring means enables a computer to determine the corresponding distance of pixels in the captured image, and thereby replace pixels corresponding to the environment behind the user with a backdrop of their choice.

However, the above system assumes a reliable reflection of infra-red images within a suitable range, and hence there is scope for alternative solutions that mitigate or avoid such issues.

In a first aspect of the present invention, a transmission system comprises a stereoscopic camera operable to generate a pair of stereoscopic video images of an environment, an image evaluator operable to evaluate the displacement of corresponding image features between the pair of stereoscopic video images, an image processor operable to modify at least one of the pair of stereoscopic video images to make image features whose displacement falls outside a previously defined displacement range unviewable, and a transmitter operable to transmit the or each modified image to a remote viewer; and in which the image analyser means is operable to evaluate the displacement of at least a first user depicted in the pair of stereoscopic video images, and the displacement range is defined relative to the displacement of the or each user as depicted in the pair of stereoscopic video images.

In a second embodiment of the present invention, a method of image transmission comprises acquiring a pair of stereoscopic images from a stereoscopic camera, evaluating the displacement of corresponding image features between the pair of stereoscopic images and determining the displacement of at least a first user depicted in the pair of stereoscopic video images, processing at least one of the pair of stereoscopic images to render unviewable those image features whose displacement falls outside a previously determined range, and transmitting the or each processed image to a remote viewer, and in which the displacement range is defined relative to the displacement of the or each user as depicted in the pair of stereoscopic video images.

In a third embodiment of the present invention, a transmission system, comprises a stereoscopic camera operable to generate a pair of stereoscopic video images of an environment, an image evaluator operable to evaluate the displacement of corresponding image features between the pair of stereoscopic video images, an image processor operable to modify at least one of the pair of stereoscopic video images to make image features whose displacement falls outside a previously defined displacement range unviewable, and a transmitter operable to transmit the or each modified image to a remote viewer; and in which the stereoscopic camera comprises a pair of cameras having a different respective overall quality.

In a fourth embodiment of the present invention, a method of image transmission comprising the steps of generating a pair of stereoscopic video images of an environment using a stereoscopic camera, evaluating the displacement of corresponding image features between the pair of stereoscopic video images, modifying at least one of the pair of stereoscopic video images to make image features whose displacement falls outside a previously defined displacement range unviewable, and transmitting the or each modified image to a remote viewer; and in which the stereoscopic camera comprises a pair of cameras each having a different respective overall quality. Further respective aspects and features of the invention are defined in the appended claims.

Embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which:

Figure 1 is a schematic diagram of an entertainment device;

Figure 2 is a schematic diagram of a cell processor;

Figure 3 is a schematic diagram of a video graphics processor;

Figures 4 A and 4B are schematic diagrams of two stereoscopic cameras;

Figure 5 is a schematic diagram of a stereoscopic camera of the type shown in Figure 4A in use;

Figure 6 is a schematic diagram of the images captured by the stereoscopic camera shown in Figure 5;

Figure 7 is a schematic diagram of a stereoscopic camera of the type shown in Figure 4B deployed for a typical web-conference;

Figure 8 is a schematic diagram of the images captured by the stereoscopic camera shown in Figure 7;

Figure 9 is a schematic diagram of a stereoscopic camera of the type shown in Figure 4B deployed for a typical web-conference;

Figure 10 is a schematic diagram of the images captured by the stereoscopic camera shown in Figure 9;

Figure 11 is a flow diagram of a method of privacy enabled transmission.

A transmission system and a method of image transmission are disclosed. In the following description, a number of specific details are presented in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to a person skilled in the art that these specific details need not be employed to practise the present invention. Conversely, specific details known to the person skilled in the art are omitted for the purposes of clarity where appropriate. Figure 1 schematically illustrates the overall system architecture of a Sony® Playstation 3® entertainment device that may be used with a stereoscopic webcam and may be used to provide analysis of its output, details of which are presented herein.

A system unit 10 is provided, with various peripheral devices connectable to the system unit.

The system unit 10 comprises: a Cell processor 100; a Rambus® dynamic random access memory (XDRAM) unit 500; a Reality Synthesiser graphics unit 200 with a dedicated video random access memory (VRAM) unit 250; and an I/O bridge 700.

The system unit 10 also comprises a BIu Ray® Disk BD-ROM® optical disk reader 430 for reading from a disk 440 and a removable slot-in hard disk drive (HDD) 400, accessible through the I/O bridge 700. Optionally the system unit also comprises a memory card reader 450 for reading compact flash memory cards, Memory Stick® memory cards and the like, which is similarly accessible through the I/O bridge 700.

The I/O bridge 700 also connects to four Universal Serial Bus (USB) 2.0 ports 710; a gigabit Ethernet port 720; an IEEE 802.1 lb/g wireless network (Wi-Fi) port 730; and a Bluetooth® wireless link port 740 capable of supporting up to seven Bluetooth connections.

In operation the I/O bridge 700 handles all wireless, USB and Ethernet data, including data from one or more game controllers 751. For example when a user is playing a game, the I/O bridge 700 receives data from the game controller 751 via a Bluetooth link and directs it to the Cell processor 100, which updates the current state of the game accordingly.

The wireless, USB and Ethernet ports also provide connectivity for other peripheral devices in addition to game controllers 751, such as: a remote control 752; a keyboard 753; a mouse 754; a portable entertainment device 755 such as a Sony Playstation Portable® entertainment device; a video camera such as an EyeToy® video camera 756; and a microphone headset 757. Such peripheral devices may therefore in principle be connected to the system unit 10 wirelessly; for example the portable entertainment device 755 may communicate via a Wi-Fi ad-hoc connection, whilst the microphone headset 757 may communicate via a Bluetooth link.

The provision of these interfaces means that the Playstation 3 device is also potentially compatible with other peripheral devices such as digital video recorders (DVRs), set-top boxes, digital cameras, portable media players, Voice over IP telephones, mobile telephones, printers and scanners. In addition, a legacy memory card reader 410 may be connected to the system unit via a USB port 710, enabling the reading of memory cards 420 of the kind used by the Playstation® or Playstation 2® devices.

In the present embodiment, the game controller 751 is operable to communicate wirelessly with the system unit 10 via the Bluetooth link. However, the game controller 751 can instead be connected to a USB port, thereby also providing power by which to charge the battery of the game controller 751. In addition to one or more analogue joysticks and conventional control buttons, the game controller is sensitive to motion in 6 degrees of freedom, corresponding to translation and rotation in each axis. Consequently gestures and movements by the user of the game controller may be translated as inputs to a game in addition to or instead of conventional button or joystick commands. Optionally, other wirelessly enabled peripheral devices such as the Playstation Portable device may be used as a controller. In the case of the Playstation Portable device, additional game or control information (for example, control instructions or number of lives) may be provided on the screen of the device. Other alternative or supplementary control devices may also be used, such as a dance mat (not shown), a light gun (not shown), a steering wheel and pedals (not shown) or bespoke controllers, such as a single or several large buttons for a rapid-response quiz game (also not shown).

The remote control 752 is also operable to communicate wirelessly with the system unit 10 via a Bluetooth link. The remote control 752 comprises controls suitable for the operation of the BIu Ray Disk BD-ROM reader 430 and for the navigation of disk content.

The BIu Ray Disk BD-ROM reader 430 is operable to read CD-ROMs compatible with the Playstation and PlayStation 2 devices, in addition to conventional pre-recorded and recordable CDs, and so-called Super Audio CDs. The reader 430 is also operable to read DVD-ROMs compatible with the Playstation 2 and PlayStation 3 devices, in addition to conventional pre-recorded and recordable DVDs. The reader 430 is further operable to read BD-ROMs compatible with the Playstation 3 device, as well as conventional pre-recorded and recordable Blu-Ray Disks.

The system unit 10 is operable to supply audio and video, either generated or decoded by the Playstation 3 device via the Reality Synthesiser graphics unit 200, through audio and video connectors to a display and sound output device 300 such as a monitor or television set having a display 305 and one or more loudspeakers 310. The audio connectors 210 may include conventional analogue and digital outputs whilst the video connectors 220 may variously include component video, S-video, composite video and one or more High Definition Multimedia Interface (HDMI) outputs. Consequently, video output may be in formats such as PAL or NTSC, or in 72Op, 108Oi or 108Op high definition.

Audio processing (generation, decoding and so on) is performed by the Cell processor 100. The Playstation 3 device's operating system supports Dolby® 5.1 surround sound, Dolby® Theatre Surround (DTS), and the decoding of 7.1 surround sound from Blu-Ray® disks.

In the present embodiment, the video camera 756 comprises a single charge coupled device (CCD), an LED indicator, and hardware-based real-time data compression and encoding apparatus so that compressed video data may be transmitted in an appropriate format such as an intra-image based MPEG (motion picture expert group) standard for decoding by the system unit 10. The camera LED indicator is arranged to illuminate in response to appropriate control data from the system unit 10, for example to signify adverse lighting conditions. Embodiments of the video camera 756 may variously connect to the system unit 10 via a USB, Bluetooth or Wi-Fi communication port. Embodiments of the video camera may include one or more associated microphones and also be capable of transmitting audio data. In embodiments of the video camera, the CCD may have a resolution suitable for high-definition video capture. In use, images captured by the video camera may for example be incorporated within a game or interpreted as game control inputs.

In general, in order for successful data communication to occur with a peripheral device such as a video camera or remote control via one of the communication ports of the system unit 10, an appropriate piece of software such as a device driver should be provided. Device driver technology is well-known and will not be described in detail here, except to say that the skilled man will be aware that a device driver or similar software interface may be required in the present embodiment described.

Referring now to Figure 2, the Cell processor 100 has an architecture comprising four basic components: external input and output structures comprising a memory controller 160 and a dual bus interface controller 170A, B; a main processor referred to as the Power Processing Element 150; eight co-processors referred to as Synergistic Processing Elements (SPEs) 11 OA-H; and a circular data bus connecting the above components referred to as the Element Interconnect Bus 180. The total floating point performance of the Cell processor is 218 GFLOPS, compared with the 6.2 GFLOPs of the Playstation 2 device's Emotion Engine.

The Power Processing Element (PPE) 150 is based upon a two-way simultaneous multithreading Power 970 compliant PowerPC core (PPU) 155 running with an internal clock of 3.2 GHz. It comprises a 512 kB level 2 (L2) cache and a 32 kB level 1 (Ll) cache. The PPE 150 is capable of eight single position operations per clock cycle, translating to 25.6 GFLOPs at 3.2 GHz. The primary role of the PPE 150 is to act as a controller for the Synergistic Processing Elements 11 OA-H, which handle most of the computational workload. In operation the PPE 150 maintains a job queue, scheduling jobs for the Synergistic Processing Elements 11 OA-H and monitoring their progress. Consequently each Synergistic Processing Element 11 OA-H runs a kernel whose role is to fetch a job, execute it and synchronise with the PPE 150.

Each Synergistic Processing Element (SPE) 11 OA-H comprises a respective Synergistic Processing Unit (SPU) 120A-H, and a respective Memory Flow Controller (MFC) 140A-H comprising in turn a respective Dynamic Memory Access Controller (DMAC) 142A- H, a respective Memory Management Unit (MMU) 144 A-H and a bus interface (not shown). Each SPU 120A-H is a RISC processor clocked at 3.2 GHz and comprising 256 KB local RAM 130A-H, expandable in principle to 4 GB. Each SPE gives a theoretical 25.6 GFLOPS of single precision performance. An SPU can operate on 4 single precision floating point members, 4 32-bit numbers, 8 16-bit integers, or 16 8-bit integers in a single clock cycle. In the same clock cycle it can also perform a memory operation. The SPU 120A-H does not directly access the system memory XDRAM 500; the 64-bit addresses formed by the SPU 120A-H are passed to the MFC 140 A-H which instructs its DMA controller 142A-H to access memory via the Element Interconnect Bus 180 and the memory controller 160.

The Element Interconnect Bus (EIB) 180 is a logically circular communication bus internal to the Cell processor 100 which connects the above processor elements, namely the PPE 150, the memory controller 160, the dual bus interface 170A,B and the 8 SPEs 11 OA-H, totalling 12 participants. Participants can simultaneously read and write to the bus at a rate of 8 bytes per clock cycle. As noted previously, each SPE 11 OA-H comprises a DMAC 142 A-H for scheduling longer read or write sequences. The EIB comprises four channels, two each in clockwise and anti-clockwise directions. Consequently for twelve participants, the longest step-wise data-flow between any two participants is six steps in the appropriate direction. The theoretical peak instantaneous EIB bandwidth for 12 slots is therefore 96B per clock, in the event of full utilisation through arbitration between participants. This equates to a theoretical peak bandwidth of 307.2 GB/s (gigabytes per second) at a clock rate of 3.2GHz.

The memory controller 160 comprises an XDRAM interface 162, developed by Rambus Incorporated. The memory controller interfaces with the Rambus XDRAM 500 with a theoretical peak bandwidth of 25.6 GB/s. The dual bus interface 170A,B comprises a Rambus FlexIO® system interface 172A,B. The interface is organised into 12 channels each being 8 bits wide, with five paths being inbound and seven outbound. This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/s outbound, 26 GB/s inbound) between the Cell processor and the I/O Bridge 700 via controller 170A and the Reality Simulator graphics unit 200 via controller 170B.

Data sent by the Cell processor 100 to the Reality Simulator graphics unit 200 will typically comprise display lists, being a sequence of commands to draw vertices, apply textures to polygons, specify lighting conditions, and so on.

Referring now to Figure 3, the Reality Simulator graphics (RSX) unit 200 is a video accelerator based upon the NVidia® G70/71 architecture that processes and renders lists of commands produced by the Cell processor 100. The RSX unit 200 comprises a host interface 202 operable to communicate with the bus interface controller 170B of the Cell processor 100; a vertex pipeline 204 (VP) comprising eight vertex shaders 205; a pixel pipeline 206 (PP) comprising 24 pixel shaders 207; a render pipeline 208 (RP) comprising eight render output units (ROPs) 209; a memory interface 210; and a video converter 212 for generating a video output. The RSX 200 is complemented by 256 MB double data rate (DDR) video RAM (VRAM) 250, clocked at 600MHz and operable to interface with the RSX 200 at a theoretical peak bandwidth of 25.6 GB/s. In operation, the VRAM 250 maintains a frame buffer 214 and a texture buffer 216. The texture buffer 216 provides textures to the pixel shaders 207, whilst the frame buffer 214 stores results of the processing pipelines. The RSX can also access the main memory 500 via the EIB 180, for example to load textures into the VRAM 250.

The vertex pipeline 204 primarily processes deformations and transformations of vertices defining polygons within the image to be rendered.

The pixel pipeline 206 primarily processes the application of colour, textures and lighting to these polygons, including any pixel transparency, generating red, green, blue and alpha (transparency) values for each processed pixel. Texture mapping may simply apply a graphic image to a surface, or may include bump-mapping (in which the notional direction of a surface is perturbed in accordance with texture values to create highlights and shade in the lighting model) or displacement mapping (in which the applied texture additionally perturbs vertex positions to generate a deformed surface consistent with the texture).

The render pipeline 208 performs depth comparisons between pixels to determine which should be rendered in the final image. Optionally, if the intervening pixel process will not affect depth values (for example in the absence of transparency or displacement mapping) then the render pipeline and vertex pipeline 204 can communicate depth information between them, thereby enabling the removal of occluded elements prior to pixel processing, and so improving overall rendering efficiency. In addition, the render pipeline 208 also applies subsequent effects such as full-screen anti-aliasing over the resulting image.

Both the vertex shaders 205 and pixel shaders 207 are based on the shader model 3.0 standard. Up to 136 shader operations can be performed per clock cycle, with the combined pipeline therefore capable of 74.8 billion shader operations per second, outputting up to 840 million vertices and 10 billion pixels per second. The total floating point performance of the RSX 200 is 1.8 TFLOPS.

Typically, the RSX 200 operates in close collaboration with the Cell processor 100; for example, when displaying an explosion, or weather effects such as rain or snow, a large number of particles must be tracked, updated and rendered within the scene. In this case, the PPU 155 of the Cell processor may schedule one or more SPEs 11 OA-H to compute the trajectories of respective batches of particles. Meanwhile, the RSX 200 accesses any texture data (e.g. snowflakes) not currently held in the video RAM 250 from the main system memory 500 via the element interconnect bus 180, the memory controller 160 and a bus interface controller 170B. The or each SPE 11 OA-H outputs its computed particle properties (typically coordinates and normals, indicating position and attitude) directly to the video RAM 250; the DMA controller 142A-H of the or each SPE 110A-H addresses the video RAM 250 via the bus interface controller 170B. Thus in effect the assigned SPEs become part of the video processing pipeline for the duration of the task.

In general, the PPU 155 can assign tasks in this fashion to six of the eight SPEs available; one SPE is reserved for the operating system, whilst one SPE is effectively disabled. The disabling of one SPE provides a greater level of tolerance during fabrication of the Cell processor, as it allows for one SPE to fail the fabrication process. Alternatively if all eight SPEs are functional, then the eighth SPE provides scope for redundancy in the event of subsequent failure by one of the other SPEs during the life of the Cell processor.

The PPU 155 can assign tasks to SPEs in several ways. For example, SPEs may be chained together to handle each step in a complex operation, such as accessing a DVD, video and audio decoding, and error masking, with each step being assigned to a separate SPE. Alternatively or in addition, two or more SPEs may be assigned to operate on input data in parallel, as in the particle animation example above.

Software instructions implemented by the Cell processor 100 and/or the RSX 200 may be supplied at manufacture and stored on the HDD 400, and/or may be supplied on a data carrier or storage medium such as an optical disk or solid state memory, or via a transmission medium such as a wired or wireless network or internet connection, or via combinations of these.

The software supplied at manufacture comprises system firmware and the Playstation 3 device's operating system (OS). In operation, the OS provides a user interface enabling a user to select from a variety of functions, including playing a game, listening to music, viewing photographs, or viewing a video. The interface takes the form of a so-called cross media-bar (XMB), with categories of function arranged horizontally. The user navigates by moving through the function icons (representing the functions) horizontally using the game controller 751, remote control 752 or other suitable control device so as to highlight a desired function icon, at which point options pertaining to that function appear as a vertically scrollable list of option icons centred on that function icon, which may be navigated in analogous fashion. However, if a game, audio or movie disk 440 is inserted into the BD-ROM optical disk reader 430, the Playstation 3 device may select appropriate options automatically (for example, by commencing the game), or may provide relevant options (for example, to select between playing an audio disk or compressing its content to the HDD 400).

In addition, the OS provides an on-line capability, including a web browser, an interface with an on-line store from which additional game content, demonstration games (demos) and other media may be downloaded, and a friends management capability, providing on-line communication with other Playstation 3 device users nominated by the user of the current device; for example, by text, audio or video depending on the peripheral devices available. The on-line capability also provides for on-line communication, content download and content purchase during play of a suitably configured game, and for updating the firmware and OS of the Playstation 3 device itself. It will be appreciated that the term "online" does not imply the physical presence of wires, as the term can also apply to wireless connections of various types.

Referring now to Figure 4A, as an alternative or supplement to the Eye Toy 757, a stereoscopic camera 1010 can be connected to the Playstation 3, typically via a USB cable but potentially by a wireless data link.

The stereoscopic camera 1010 comprises a pair of camera units (e.g. CCDs with suitable optics) 1012, 1014 mounted in a housing and having a known physical relation to each other; for example being horizontally aligned and separated by a known distance δ. The outputs from these camera units (or simply 'cameras') are a pair of stereoscopic video images that may be combined by an encoding unit 1016 and then output. Alternatively the camera outputs may be output separately, in which case two separate encoding units may be provided. Alternatively in either case specific encoding is not required and so there is no encoding unit at all.

Notably, in the stereoscopic camera of Figure 4A, the cameras are physically mounted in parallel to each other. As a result, the central lines of sight of the respective cameras are also parallel to each other and separated by distance δ.

As is well known, as a consequence of perspective two separate parallel lines will appear to get closer together with distance, eventually converging at infinity (the so-called 'vanishing point').

When the Playstation 3 and stereoscopic camera are operable coupled, in an embodiment of the present invention they act as a transmission system for the stereoscopic images (subject to any image processing as described herein) by making use of one or more of the communication ports of the PS3 described previously.

Referring now to Figures 5 and 6, this effect can be seen in the views of the respective cameras (labelled A and B in Figure 5) of the stereoscopic camera. In Figure 5, a stereoscopic camera of the type seen in Figure 4A is shown acting as a webcam and capturing an image of an environment in which five objects, P, Q, R, S and T are spaced apart receding from the stereoscopic camera.

As is seen in Figure 6, the captured views from cameras A and B in Figure 5 show the objects P, Q, R, S and T converging toward the vanishing point as a function of distance from the stereoscopic camera. Referring also to objects X and Y (for clarity assumed to be located above the other objects), then more generally it is the relative difference in position (or 'displacement') between the same object in the two images that is a function of the distance of the object from the camera (specifically, it is a function of the subtended angle between the object and the centreline of the respective camera's viewpoint). Thus close objects will have a large relative displacement whilst distant objects will have a small relative displacement. Thus in the case of objects X and Y, there is a greater displacement between the two images for object Y than there is for object X.

These displacements can be evaluated using known pattern matching or autocorrelation techniques for successive increments or shifts of overlap (i.e. displacement) between the images. Other pattern matching techniques will be known to the person skilled in the art and may be used as applicable. However, it will be appreciated that with parallel camera viewpoints, the displacements for more distant objects become small and may be difficult to resolve accurately with cameras of the resolution typically found in webcams. This may be mitigated by increasing the resolution of the cameras, which is likely to increase cost, or by increasing the separation of the cameras, which may make the stereoscopic webcam too large for its desired use.

Thus a refinement of the stereoscopic webcam of Figure 4A is the stereoscopic webcam 1020 of Figure 4B. In this version of the stereoscopic webcam, the cameras are not parallel but are angled inwards with respect to each other so that their image planes are not horizontal (180 degrees) but intersect at an angle θ being close to 180 degrees. Again the output may be combined and/or encoded by a common encoder 1026 or separate encoders, or may not be encoded before output.

The effect of mounting the cameras so as to incline towards each other is that the central lines of sight of their respective camera views are no longer parallel and so intersect before infinity. The distance to intersection is a function of the relative angle of inclination, and gets closer as the angle moves away from 180 degrees.

In an embodiment of the present invention, the cameras are inclined so that the views intersect at an anticipated typical user distance from the stereoscopic webcam. The precise angle chosen may therefore depend both upon the separation of the cameras and the anticipated use of the cameras. For example, a domestic stereoscopic webcam is likely to be operated by a user sitting closer to it than an office-based video-conferencing camera. Possible values may therefore lie between 179.9 and 120 degrees, for example, although in principle other angles may be used if required.

In an embodiment of the present invention, the angle is user adjustable. For example, the user may be able to manually adjust a mechanical linkage between the cameras, thereby adjusting the relative angle of inclination, or such a linkage (or separate angular rotation means) may be motorised and controlled via a user interface associated with the host device to which the stereoscopic web-cam is attached.

Referring now to Figures 7 and 8, the deployment of a stereoscopic webcam of the type seen in Figure 4B is now discussed. In Figure 7, the respective cameras are again labelled A and B, and the central lines of sight of their respective fields of view are shown to meet at the intersection of lines A-A and B-B. As noted above, the separation and relative inclination of cameras in the stereoscopic camera are arranged so that this intersection occurs at a typical distance of operation by a user 1100, either at manufacture or by user adjustment. For the purposes of explanation only, two further objects are shown in Figure 7; a foreground object M (for example, a microphone) and a background object L (for example, a ceiling light).

In Figure 8, and looking first at the view from camera A (left image) and then at the view from camera B (right image), the object M, lying in front of the point of intersection of the fields of view, appears to move to the left between the images, whilst the object L, lying behind the point of intersection of the fields of view, appears to move to the right. Meanwhile the user, being substantially at the point of intersection, does not have a significant displacement. To use arbitrary and non- limiting values for the sake of explanation, analysis of these images might indicate that object M has a displacement of 150, the user had a displacement of 0 and the object L had a displacement of -30, the minus sign being indicative of the opposite apparent direction of travel. The sign may also be thought of as arising from a possible pattern matching technique; for example, starting with the images exactly overlaid on one another and then moving them relative to each other in steps, it may take 150 steps in one direction to match instances of object M, but 30 steps in the other direction to match instances of object L.

Again, as with the stereoscopic webcam of Figure 4A, the distance of an object from the camera can be evaluated as a function of the relative difference in position of the object in the image from each camera (its 'displacement'), but now, because the central viewpoints of the cameras converge at a predetermined operation distance rather than at infinity, objects at a distance less than the predetermined operation distance appear to move in one direction between the stereoscopic images whilst objects at a distance larger than the predetermined operation distance appear to move in the opposite direction, providing further discriminatory information.

Notably, bringing forward the point of intersection of the respective viewpoints also serves to increase the relative displacement of more distant objects as compared with an otherwise similar stereoscopic camera of the type seen in Figure 4A. This improves the accuracy with which such distant objects can be evaluated and serves to better distinguish them from the object of interest, namely the user.

It will be understood that the amount of separation and inclination of the cameras as illustrated in the drawings is for illustrative purposes only and is not limiting.

In Figure 7, the user is located at the point of intersection of the two image viewpoints, and therefore is substantially in the same position in both images shown in Figure 8. However, it will be appreciated that the user may not operate a stereoscopic camera (or webcam) at this predetermined operation distance, or may adjust the distance incorrectly so that it does not coincide with the user.

Referring now to Figures 9 and 10, in the case shown in Figure 9 a user is operating a stereoscopic webcam from a distance shorter than the predetermined operation distance. As can be seen in Figure 10, the user therefore also appears to have a significant relative displacement between the images. However, it will be noted that this displacement is less than that of the even closer object M, and is in the opposite direction to the more distant object L.

It will be appreciated that in a converse scenario where the user sits slightly beyond the predetermined operational distance, the user will again have a significant relative displacement between the images, but this time it will be in the same direction as object L, although again it will be of smaller displacement than the more distant object L.

Thus in each case, whether the user sits either at, in front, or behind a predetermined or user-adjusted operational distance at which the central viewpoints of the camera are arranged to intersect, it is still possible to evaluate whether other objects are in front of or behind the user by determining their respective magnitude and/or direction of displacement between the two camera images.

It will be appreciated that a similar evaluation can be made with the stereoscopic webcam of Figure 4A; for example after notionally replacing object R with a user it is still possible to evaluate that objects P and Q are in front of the user and S and T are behind the user based upon the relative magnitudes of displacement between the objects in the two camera images.

In each case a background distance value and a foreground distance value can be used to form a cordon outside which objects may be obscured or replaced to enhance privacy, and inside which objects are transmitted faithfully (optionally subject to other conventional image processes separate to the present invention).

In relation to the stereoscopic images generated by the stereoscopic camera, this cordon can therefore be simply expressed as two threshold displacement values. In the case of a stereoscopic camera of the type seen in figure 4A, the magnitude of displacement alone is sufficient, whereas in the case of a stereoscopic camera of the type seen in figure 4B, the direction or sign (±) of the displacement may also be required if the cordon encompasses the distance at which the central lines of sight of the cameras intersect. As such, the processing to identify image features can thus be reduced to evaluating whether the displacement of an image feature falls outside the range of displacements bounded by the current cordon.

As such in order to determine what image features belong inside or outside the cordon it is not strictly necessary to evaluate the actual distances from the stereoscopic camera as such, but simply to evaluate the displacements between corresponding image features of the stereoscopic images, these being indicative of the actual distances.

In an embodiment of the present invention, the Cell processor of the PS3 operates as an image evaluator providing such image evaluation as described herein.

Given any of the above arrangements, it then becomes possible to apply image processing strategies to give the user of the stereoscopic camera greater privacy, as described below.

In the case where stereoscopic transmission is to be used (for example for remote display on a 3D display unit), both images may be processed.

In the case where only a single image is to be used for transmission (as in conventional web-cam usage), then only the one image selected for transmission may be processed (i.e. modified). In this case an indicator (e.g. an LED or a pointer on a nearby display) may be activated nearest the camera whose feed will be transmitted, so that the user can look at the relevant camera face-on.

In either case, the or each image may be processed to render extraneous components of the image unviewable, for example by removing, replacing or obscuring them.

In an embodiment of the present invention, the Cell processor of the PS3 operates as an image processor providing such image processing as described herein to modify the or each image.

The extraneous components may include background features or optionally foreground features that are outside the cordon.

'Foreground' and 'background' features in this context refer to features that are in front of or behind the user. 'In front' and 'behind' may in turn be taken to mean in front and behind the user by a threshold margin or distance, and the threshold margin may be different in front and behind. Moreover the thresholds may be expressed in absolute terms (e.g. 1 metre behind the user) or in relative terms (e.g. 25% further than the mean distance of the user). Similarly 'in front' and 'behind' may simply be taken to mean two absolute distances from the stereoscopic camera that are expected to encompass the user - for example 0.5m and 2.5m respectively. It will be appreciated that the optional foreground threshold may be eliminated or equivalently made equal to zero metres. It will be appreciated that as noted previously herein such distance values may similarly be represented in terms of displacements within the stereoscopic images.

Similarly 'the user' can refer to one or more users. However, in this case optionally there can be a cut-off threshold beyond which people in the image that are more than a certain distance from the camera are considered to be part of the background, and not part of a user group. The cut-off threshold can be absolute (i.e. relative to the stereoscopic webcam) - for example 2 metres from the camera - or relative to a user, for example twice the distance of the closest user, or one metre behind them. It will again be appreciated that such distance values may similarly be represented in terms of displacements within the stereoscopic images. Hence, for example, a distance range corresponding to a displacement range may be defined relative to the displacement of a user within the pair of stereoscopic images.

In each case in an embodiment of the invention the cut-off threshold (and/or the in front / behind threshold margins) can optionally be made subject to absolute minimum or maximum limit values, or can be either overridden or adjusted via a user interface; for example either a slider or buttons on the stereoscopic camera itself, or via an input interface to the device connected to the stereoscopic camera, such as controller 751.

In this way in an embodiment of the present invention a user can define the effective cordon between the 'background' and 'foreground' in which people and objects will be assumed to be public, rather than private. In the case where no 'foreground' region is implemented, then such a cordon will thus extend to the front of the stereoscopic camera itself.

Given a cordon according to the above criteria, features of the image that are identified as being outside the cordon can then be made unviewable; for example they can be pixellated or otherwise obscured by a low-pass filter applied to give a so-called 'frosted glass' effect. Other similar effects will be apparent to the skilled person. Similarly the image features can be replaced with substitute image features, for example being pre-stored on the PS3 or similar host device.

By extension the whole background (as defined by distance / displacement beyond the cordon) can be obscured or replaced with an alternative background static image or video clip of the user's choice. In the case of stereoscopic transmission, respective stereoscopic background images or video clips can be used for each captured image to maintain an immersive 3D effect for a remote viewer using a 3D display device. As noted above, the user may adjust the cordon representing the distance to the background, and optionally to the foreground. Where this is defined in absolute terms (e.g. the background is 2.5 metres or more from a stereoscopic webcam, and the foreground is 50 centimetres or less) then provided the user sits within this cordon their image will be clearly transmitted.

However, if the cordon is defined in terms relative to the position of the user, then it becomes necessary to determine the distance to the user.

The user or users themselves may be identified by a number of methods. The simplest is to assume that the first large object roughly central to the captured images is the user. A complementary assumption would be that two similar large objects at a similar distance from the stereoscopic webcam are a pair of users.

More sophisticated methods may include known face/body recognition techniques to locate the user in the images. In the case where multiple faces are found, that which is closest to the webcam can be assumed to be the (or the primary) user, and the cut-off threshold described herein above may then be applied accordingly to categorise other faces as users or background.

Once a user has been identified and their distance has been evaluated, a cordon defined in terms relative to them can be applied, and objects (including the user) within this cordon will be clearly transmitted.

In addition, as noted previously, optionally objects in the foreground can also be obscured or replaced. This may be useful, for example, to obscure notes or objects on a user's desk. If this option is selected, then further optionally only static objects may be obscured; this allows the user's hand gestures, for example, to remain visible even if they extend into the foreground.

In these ways, a 2D or 3D (stereoscopic) camera transmission such as a web-cam transmission can be sent to a remote viewer in which only a pre-defined, user-defined and/or user-centred portion of the scene is transmitted without being obscured or replaced; specifically the background (and/or foreground), as defined either with respect to the stereoscopic webcam or with respect to a user, can be obscured or replaced, with the modified images being transmitted instead of the originally captured images.

In an embodiment of the present invention in the case where the stereoscopic camera has motorised or similarly actuated control of the relative angles of its cameras, the relative angle between the pair of cameras can be automatically adjusted to move the point of intersection of the central viewpoints of the respective cameras to coincide with the (or the first) identified user. This has a first benefit that small image displacements (which as noted previously herein may be difficult to resolve) are centred on the user and hence their accurate evaluation is relatively unimportant, as they will be in the centre of the acceptable displacement range surrounding the user in any event. It also has the benefit of simplifying the integration of the user within a pre-rendered replacement stereoscopic background.

Alternatively, in an embodiment of the present invention a (potentially cheaper) version of the stereoscopic camera is arranged to have the point of intersection of the central viewpoints of the respective cameras (and hence zero displacement between stereoscopic images) at a distance at or near to a predefined background distance defining a limit of the cordon. In this case, discrimination of the background then becomes a matter of which direction an image feature appears to move between the stereoscopic images (i.e. the sign of the displacement, as explained previously herein with reference to Figures 7 to 10). A variant of this embodiment may again allow user adjustment of the camera angles to change from this default background distance.

The stereoscopic camera or web-cam may be provided as a separate device physically similar to a conventional webcam, incorporating the selected camera separation δ. Alternatively the stereoscopic web-cam may be provided as a separate device designed specifically to complement a host device, such as for example a stereoscopic web-cam that complements the shape of the Sony PlayStation Portable ® entertainment device. Alternatively the stereoscopic web-cam may be integrated into a host device, for example within the frame of a laptop display (or other such mobile computing device). In this case the possible camera separation δ will be limited by the size of the laptop.

In addition to web-cams, the above principles may also be applied to broadcast cameras, for example to create a so-called 'green screen' background replacement effect in any studio environment or outdoor location. In this case, however, if the broadcast camera is not itself a stereoscopic broadcast camera used for stereoscopic broadcasts, then an additional, secondary camera provided for the purpose of implementing the above principles need not be of similar overall quality to the primary broadcast camera; for example it may be a domestic quality standard- or high- definition camera, either attachable to and detachable from the primary camera or integrated with it. In this case 'quality' may refer to any feature of the second camera that affects cost, whether it is optics, CCD noise performance, electronics, stabilisation systems, etc. Thus potentially the second camera may be of a lower overall quality but nevertheless have the same resolution as the primary camera, for example.

Consequently the respective images of the resulting stereoscopic pair of image will have a different overall quality, but with sufficient similarity to enable the identification of corresponding image elements in the image pair, and hence their relative displacements and the other associated image evaluations and analyses described herein. The two cameras may also be set at a fixed or adjustable angle other than 180 degrees, or be automatically adjustable, as described herein.

It will be appreciated that a broadcast camera may encompass the full transmission system, including the two cameras, analysis (evaluation) and image processing systems, and means to transmit the result to a remote viewer such as a studio editor.

Likewise, in addition to the above strategies for defining a cordon either in absolute terms or with respect the user, in this case the first remote viewer is either the cameraman or the editor, and hence they can operate interactively with the camera to identify, for example, whether a person is currently part of a user group, and hence define a foreground or background cordon dynamically. Typically this would be done via a user interface on the camera or studio console that enabled the selection of individuals who are to be currently treated as within the transmission cordon. Optionally such individuals could then be tracked using known object tracking techniques, for example, enabling the cordon to be dynamically adjusted automatically.

Referring now to Figure 11, a method of image transmission comprises: in a first step, acquiring (slO) a pair of stereoscopic images from a stereoscopic camera; in a second step, evaluating (s20) the displacement (difference in position) of corresponding image features between the pair of stereoscopic images; in a third step, processing (s30) at least one of the pair of stereoscopic images to render unviewable those image features whose displacement falls outside a previously determined range of displacements; and in a fourth step, transmitting (s40) the or each processed image to a remote viewer.

It will be apparent to a person skilled in the art that variations in the above method corresponding to operation of the various embodiments of the apparatus as described and claimed herein are considered within the scope of the present invention, including but not limited to: the stereoscopic camera comprising a pair of cameras aligned to view in parallel; or the stereoscopic camera comprising a pair of cameras angled so that their centre lines converge within a finite distance; wherein that distance is selected to be a typical operating distance for a use of the camera; and/or the distance is definable by altering the relative angle of the cameras, and this may be done by the user or by the system in response to detecting the user; the stereoscopic camera and/or the system it is connected to comprising a user interface that allows a user to define a background distance; wherein that distance may be relative to the stereoscopic camera or to a user within the stereoscopic camera's view; and optionally also define a foreground distance; which similarly may be relative to the camera or a user; and wherein any distances relative to a user may similarly be relative to two or more users, for example in front of the closest user and behind the furthest user; subject to (user defined) absolute limits; for a non-parallel camera arrangement, the apparent direction of image features may be used as a further discriminatory indicator; for example the stereoscopic camera can be arranged to have a zero displacement between images substantially at a pre-set background distance, and then the apparent direction of image features may be used as the primary basis of discrimination between transmitted features and background features; image features may be rendered unviewable by use of pixilation or other filtering, or by substitution with another image/video feature; optionally image features in front of a user may only be rendered unviewable if they are static (for example if there is no movement for N seconds). the stereoscopic camera being integrated with or configured to fit a portable computing device; and one of a pair of cameras in the stereoscopic camera being of a lower quality than the other; for example, one camera may be of broadcast quality whilst one is of domestic consumer quality; but nevertheless these may have the same resolution.

It will be appreciated that the distribution of analysis and image processing between the stereoscopic camera and the host device may very depending on the nature of the stereoscopic camera and/or the host device. For example in the case of a low cost stereoscopic web-cam, all analysis and processing may be done by the host device (such as the PS3 or Playstation Portable mentioned above). By contrast, a broadcast camera may be equipped to perform such processing itself, enabling the output of images with features already obscured or replaced. Other embodiments of the transmission system may provide an intermediate distribution of analysis; for example a professional web-conferencing stereoscopic camera may provide the images and separately provide depth information for the features therein (for example as a so-called chromadepth image), enabling a conference system to easily select objects to be obscured or replaced without further analysis. A similar output may be provided by a broadcast camera.

Furthermore, it will be appreciated that the methods disclosed herein may be carried out on conventional hardware suitably adapted as applicable by software instruction or by the inclusion or substitution of dedicated hardware, and these may be distributed as applicable between the hardware of the stereoscopic camera and the host device (where these are separate).

Thus the required adaptation to existing parts of a conventional equivalent device or devices may be implemented in the form of a computer program product or similar object of manufacture comprising processor implementable instructions stored on a data carrier such as a floppy disk, optical disk, hard disk, PROM, RAM, flash memory or any combination of these or other storage media, or transmitted via data signals on a network such as an Ethernet, a wireless network, the Internet, or any combination of these of other networks, or realised in hardware as an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array) or other configurable circuit suitable to use in adapting the conventional equivalent device or devices.

Previous Patent: IMPROVED WING STRUCTURE

Next Patent: HEAT ENGINE