Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR PANNING, TILTING, AND ADJUSTING THE HEIGHT OF A REMOTELY CONTROLLED CAMERA
Document Type and Number:
WIPO Patent Application WO/2008/103418
Kind Code:
A2
Abstract:
A method and apparatus for panning and tilting a remotely located camera or an audio-visual assembly affixed to a telepresence robot in order to select a view, and for remotely adjusting the height of a camera or an audio-visual assembly affixed to a telepresence robot.

Inventors:
SANDBERG ROY (US)
SANDBERG DAN (US)
Application Number:
PCT/US2008/002302
Publication Date:
August 28, 2008
Filing Date:
February 21, 2008
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SANDBERG ROY (US)
SANDBERG DAN (US)
International Classes:
H04N9/78
Foreign References:
US20030076480A1
US5583565A
US20040046888A1
US20040088078A1
US4298128A
US20010036266A1
US2607863A
Download PDF:
Claims:
What is claimed:

1. A method of controlling a remotely located camera comprising the steps of: a) specifying a horizontal and vertical displacement for an on-screen video image; b) calculating a pan motor movement and a tilt motor movement amount based on the horizontal displacement and the vertical displacement; and c) moving a pan motor and a tilt motor by the calculated pan motor movement and the calculated tilt motor movement, wherein the pan motor and the tilt motor mechanically pan and tilt a video camera.

2. The method in claim 1, further comprising the step of: correcting for fisheye distortion of the on-screen video image by using a rectalinearizing algorithm.

3. The method of claim 1, wherein:

The remotely located camera is mounted on a mobile telepresence robot.

4. The method of claim 1, wherein:

The calculated motor movement amount is multiplied by a scaling factor, the scaling factor inversely proportional to the current zoom factor of the video image.

5. A system for controlling a video camera comprising: a) a video camera; b) a electromechanically actuated tilt and pan camera mount connected to the video camera; c) a local computational device, sending video data accepted from the video camera, receiving transmitted movement commands, and controlling the electromechanically actuated tilt and pan camera mount in response to the transmitted movement commands;

d) a remote computational device, processing video data received from the local computational device, and sending movement commands to the local computational device in response to a horizontal and vertical image displacement specifier event, wherein the remote computational device sends movement commands in response to the horizontal and vertical image displacement specifier event; and e) a remote display device, displaying video data processed by the remote computational device, and updating the video image location in response to horizontal and vertical image displacement specifier event.

6. The system in claim 5, wherein:

The electromechanically actuated tilt and pan camera mount is attached to a mobile telepresence robot.

7. The system in claim 5, further comprising: an image rectilinearlizing device, rectilinearizing the video data generated by the video camera.

8. A method of controlling a remotely located camera comprising the steps of: a) Selecting a location within an image generated by a camera; b) computing Cartesian coordinates corresponding to the selected location; c) Moving a remotely located camera by a longitudinal angle computed from the Cartesian coordinates using inverse gnomonic projection equations; and d) Moving the remotely located camera by a latitudinal angle computed from the Cartesian coordinates using inverse gnomonic projection equations

9. The method of claim 8, wherein:

The remotely located camera is an element of a mobile telepresence robot.

10. A system for controlling a remotely located camera comprising: a) a two-dimensional coordinate selector, selecting a two-dimensional coordinate from within an image generated by the remotely located camera; b) a motor-drive, driving the remotely located camera; c) a local computational device controlling the motor drive; d) a communications link interconnecting the local computational device with a remote computational device; and e) a gnomonic calculation calculator executing on a computational device selected from the group consisting of the local computational device and the remote computational device; wherein the gnomonic calculation calculator calculates a longitudinal angle and a latitudinal angle using an inverse gnomonic projection calculation and the two-dimensional coordinate, the longitudinal angle and latitudinal angle used to control the motor-drive.

11. An apparatus for controlling the height of an audio-visual assembly, comprising: a) an audio-visual assembly comprising a camera and speaker; b) a sliding tube, mechanically attached to the audio-visual assembly; c) a fixed tube, slideably attached to the sliding telescoping tube; and d) a multi-conductor routed from the bottom of the fixed tube to the top of the sliding tube, whereby the fixed tube may raise and lower with respect to the sliding telescoping pole while maintaining electrical connectivity.

12. The apparatus of claim 11 wherein: the fixed tube is mounted on a robotic mobile base unit.

13. The apparatus of claim 11, wherein:

the sliding tube is rotatably attached to leadscrew, and the leadscrew is driven by an electromechanical assembly, and the electromechanical assembly is attached to the fixed tube.

14. The apparatus of claim 11, wherein: the sliding tube is mechanically attached to a drive belt; and the drive belt is driven by an upper pulley and a lower pulley; and the upper pulley and lower pulley are attached to the fixed tube; and one of the pulleys is rotatably driven by an electromechanical assembly.

15. The apparatus of claim 11, wherein:

A flat cable is routed inside the bottom of the fixed tube to the top of the fixed tube; and the flat cable is creased 180 degrees at the top of the fixed tube; and the flat cable is routed from the crease down the inside of the fixed tube; and the flat cable turns from the inside of the fixed tube into the inside of the sliding tube; and the flat cable exits from the top of the sliding tube.

16. The apparatus of claim 11, wherein:

One end of a helical multi-stranded cable is affixed near the bottom of the fixed tube; and the opposite end of a helical multi-stranded cable is affixed near the bottom of the sliding tube; whereby the flexibility inherent in the helical multi-stranded cable affords a continuous electrical connection while the sliding tube travels relative to the fixed tube.

17. An apparatus for tilting an audio- visual assembly, comprising: a) a rigid arm rotatably connected to an audio- visual assembly through a rotating coupling; b) an electromechanical device rotating the audio-visual assembly about an axis that intersects the rotating coupling; wherein c) the axis substantially intersects a center of mass of the audio-visual

assembly

18. The apparatus of claim 17, wherein: the audio-visual assembly includes a camera, and the an audio- visual assembly includes a downward tilt angle, the downward tilt angle enabling a camera field-of-view to include a location behind a vertical axis formed by the audio-visual assembly and the floor directly below the audio-visual assembly.

Description:

Method and apparatus for panning, tilting, and adjusting the height of a remotely controlled camera

BACKGROUND OF THE INVENTIQN

(1) Field of Invention

The present invention is related to the field of remote surveillance, more specifically, the invention is a method and system for rotating and controlling the height of a remotely located camera.

(2) Related Art

Remote surveillance technologies often allow a user to pan and tilt a camera or series of cameras from a distance in order to select a specific view. The pan and tilt operation is often accomplished with input devices such as joysticks, dials, or onscreen graphical user interface controls such as slider bars. These technologies are limited in that they do not enable an intuitive correlation between input device movement and the degree of motion that will result in the camera. When the camera is only capable of moving slowly, or the camera is coupled to a control channel with a high latency, this can complicate rapid positioning of the camera. The user only obtains feedback through the perceived motion of the camera, and so it is often not clear how much movement on the input device results in the desired camera position until a few control inputs have been made. Often, the subject that the user is attempting to view has moved by this point, compounding the difficulty of controlling the camera.

Also, present surveillance technologies do not allow a user to remotely adjust the height of the camera to provide multiple viewpoints for a given camera. This limits the usability of the camera as certain viewpoints may be occluded, and for social reasons, it is sometimes useful to have the camera at eye level of people in its vicinity, regardless of whether those people are sitting or standing.

SUMMARY OF THE INVENTION

The present invention is a new and improved method and apparatus for panning and tilting a remotely located camera in order to select a view. The invention also discloses a way to adjust the height of the remote camera from a distant location. Both of these techniques may be used with a camera affixed to a telepresence robot.

This patent application incorporates by reference copending application 11/223675 (Sandberg). Matter essential to the understanding of the present application is contained therein.

This application claims the benefit of U.S. Provisional Application No. 60/902936 ("Method and apparatus for rotating a remotely controlled camera", Sandberg), filed 02/22/07, and included herein by reference.

This application also claims the benefit of U.S. Provisional Application No. 60/902730 ("Method and apparatus for dynamic height adjustment of a telepresence robot audio- visual assembly", Sandberg), filed 02/22/07, and included herein by reference.

A camera's visual field is processed by a computer, and displayed on a computer screen, or in a graphical user interface element, known as a frame, on a subset of the computer screen. In the simplest embodiment, a user specifies a point on the image which the pan and tilt camera should re-center around. This re-centering can be done digitally ( if the camera has captured more information that is displayed ) or by moving the camera so that the requisite visual field is captured. Any input device known in the art ( henceforth "input device" ) can be used to specify the point, such as a mouse, touch pad, touch screen, keyboard or joystick. Each input device can use any input technique known in the art ( henceforth "input technique" ) such as a mouse click, "drag", movement, or double click for a mouse, touch screen, touchpad, trackball, or eraser head, or a sequence of commands for a keyboard, or an on-screen target specified with a joystick.

In one embodiment, the user selects a point on the image which should become the new center-point of the field of view of the camera. The computer then calculates the direction and distance from the old center of the image to the desired new center of the image and then sends a command to the remote camera that results in the camera moving in an angle that is correlated with desired new center point. As the camera moves from it's original position to the new position, the user perceives the camera's field of view as moving toward the area they selected, such that at the completion of the move, the position originally demarcated is in the center of the screen.

If the input technique used supports continuous updating of the input device position ( for example, a mouse "drag" ), then a user viewable cue, such as an icon, image, or cross-hairs, can move on the frame to give the user feedback as to what the re- centering point will be. Additionally, the location of a bounding box can be continually updated to show which parts of the old image will and will-not be included in the new image once the camera has re-positioned on the final designated point.

In an alternate embodiment, an input technique such as a mouse drag can be used to reposition the video image itself within the frame, thereby specifying a desired tilt and pan. The computer calculates the direction and distance that the video image has been dragged, while cropping the video image so that it does not extend beyond the boundaries of the window frame. The computer then sends a command to the remote camera that results in the camera moving in an angle that is correlated with the final location of the video image after it has been dragged.

A further refinement on the image dragging technique is to update the image continuously until the dragging has finished. This involves continuously moving the camera ( or digital panning and tilting ) in response to drag movements. When the image is moved an "uninitialized" area is shown to represent the fact that part of the image has not been received yet This uninitialized area can be blank, or have a pattern or image which signifies that it is uninitialized. Alternatively, if a low- resolution image of the "uninitialized" area is available, that can be shown until the full normal-resolution image is received.

The techniques allow a user to drag a cross hair or image on a video display screen by an amount that corresponds to the angle and direction of the rotation and tilt movement that is desired. Because the motion of the camera is proportional to the distance of the drag event, the interface gives the user an intuitive sense for how far the remote camera will move before it even begins its motion and without any prior experience with the device. When dragging the video image, the computer fills in the uninitialized area in the screen or frame as the camera moves into the new position, thereby giving immediate feedback as to the camera's motion. This causes the user to perceive that the system has instantaneously responded to his movement requests, but still accurately reflects the view seen by the camera as it moves.

In an alternative embodiment, rather than using a "drag" action to reposition the video image or cross hair, a movement of local pointing device such as a mouse, trackball or touch pad, (without the continuous pressing of a button required with a "drag" action) is used to reposition the remote camera. The movement of the local pointing device is also reflected as a change in the position of a local on-screen icon such as a crosshair, or by movement of the entire local video image, hi the preferred embodiment, movement of the pointing device is immediately translated into a proportional movement of the remote camera; no additional action is required. The icon or video image movement is proportional to the angular movement experienced by the remote camera, and so a local user will receive immediate feedback as to how far the remote camera will move, hi an alternative embodiment, a "click" action or other key-press action is used to signal that camera movement should begin, hi this embodiment, camera movement feedback is not present until after the key-press action has occurred.

hi another alternative embodiment, rather than using a "drag" action to reposition the video image or cross hair, a button click on a location on the video image repositions the video image such that the point selected is centered at the middle of the frame or screen. For example if the upper left corner of the frame is selected (by clicking or some other selection means), then the left hand corner of the video image must be moved to the center of the video frame. This would result in only the upper left

quandrant of the original video image being visible in the lower right quadrant of the frame. As with other embodiments of this invention, the far-side camera would then pan and tilt to a new location such that the video data for the missing three quadrants can be filled in.

In another alternative embodiment, head, hand or eye tracking equipment known in the art is used to determine a pan and tilt angle for a user's head, hand, or eyes. This pan and tilt angle is then used to drag a cross hair or image as in previous embodiments. Using this technique, a user can position a remote camera by moving his head,hand or eyes and without ever touching a tradition computer input device such as a keyboard or mouse.

In yet another alternative embodiment, a user is equipped with a head mounted display, for example, virtual reality goggles, and head tracking equipment known in the art. The head tracking equipment is used to determine a pan and tilt angle for the user's head. This pan and tilt angle information is used to move an image on the head mounted display just as it is used to drag an image on a traditional display in other embodiments. Because movement of the image on the local (head mounted) display can be done with very low latency (because no information from the remote site is required), only a small lag between user head movement and image motion exists. The remote camera will still track the user's head, albeit with some delay, but the local user does not perceive this delay because the local image has moved to the location corresponding to the remote camera motion. This embodiment reduces or eliminates the image lag perceived by a user, and thereby reduces the occurrence of VR sickness.

In another embodiment, the remote camera is equipped with digital or optical zoom using techniques known in the art. Using knowledge of the field of view of the remote camera at a given zoom setting, camera motion can be tailored to each possible zoom setting, thereby assuring proper correlation between the drag or movement distance intiated on the local display and the remote camera motion. In other words, a wide angle view displayed on a display may encompass (for example) 120 degrees of visual field as measured horizontally. If a user selects a portion of the image on the rightmost edge of the visual field as the desired move location, the camera will need

to move 60 degrees (half of the visual field) to arrive at the desired location. However, if the camera has been zoomed in by 4X, the horizontal field of view will now be 120/4 = 30 degrees. In this zoom setting, the camera will only move 15 degrees (half of the visual field) to arrive at the desired location. Knowledge of the current zoom setting allows the remote camera to move the appropriate distance in each case. In the preferred embodiment, the current zoom setting (for example 1.0 is default size and 4.0 is four times the default size) is recorded by the computer and used to correctly move the remote camera to the position selected by the user. Mathematically, current zoom setting is inversely proportional to the move distance required of the remote camera, and can be calculated by a simple division or by other techniques known in the art of computer programming. This enables a user to move the remote camera to the selected position on his screen regardless of the current zoom setting.

In the preferred embodiment, a nonlinear mathematical transformation is used to convert a selected location on the local screen into pan and tilt motion on the remote camera. While a naive solution assumes a linear relationship between a local screen coordinate and remote camera motion, this is not the case. For example, consider a pan and tilt camera, mounted on a wall, that is angled down 85 degrees from the horizon (i.e., looking nearly straight down at the floor). In the naive implementation, a user who selects a point exactly 40 degrees to the left of the screen center, would induce a movement only on the pan motor, because this would be interpreted as move purely to the left - not up or down. But in fact, pure pan movement on a camera that is pointed nearly straight down results in movement in an arc relative to the center point of the field of view of the camera. Compensating for the arc motion that occurs when camera movement is not along the horizon requires a nonlinear relationship between the point selected on the screen and the pan and tilt motion.

In particular, a mathematical transformation known as an inverse gnomonic projection will correct for these nonlinearities. A gnomonic projection is a cartographic projection obtained by projecting a point on the surface of sphere from the sphere's center to point in a plane that is tangent to a point on the sphere. The gnomonic projection represents the image formed by a spherical lens, and is sometimes known

as the rectilinear projection. Assuming rectilinearity of the camera lens, an accurate mapping from the lens view to a coordinate in a spherical coordinate system can be calculated. While not all lenses, for example fish-eye lenses, are rectilinear, it is a reasonable approximation for nearly all camera lenses.

The inverse gnomonic projection for a point (x,y), normalized to (0..1), on the screen/frame can be calculated as follows:

ώ * - . _i { , ^ y smccosφi s Min- " * 1 ( 1 m coes #c" csimn Aø ! . + _

λ \ - — X % 0 + ^ tan - l { x sine 1 I L

V. p cosλ, cosc -y sinώj sine /

where

P = V X 2 + f

c = tan '1 p and λ is the angle to move the pan motor (longitudinal movement), φ is the angle to move the tilt motor (latitudinal movement), A 0 is the central longitude, and φo is the central latitude.

In the event that a fϊsheye lens is used, the imaging data it collects will not be rectilinear, and there will not be a linear relationship between locations as viewed through the fisheye lens. This would introduce large inaccuracies when using the pan and tilt technique that is being described. However, the image transmitted to the computation device from the camera using a fisheye lens may be converted from the original image (which suffers from heavy barrel distortion) to a rectilinear format using the following formula:

RadilISundiaorted = tan(10 C∞majon RadiuSdiaoned )/ (10 Coo TO ctfon )

Ccorreetion is the correction coefficient that must be tuned to the degree of barrel distortion present with a specific len

RadiuSundistortedis undistorted radius, meaning the distance from the center of the lens (and generally the center of the imaging device) to a specific pixel after it has been moved to it's corrected (rectilinear) location.

RadiuSdistorted is distorted radius, meaning the distance from the center of the lens (and generally the center of the imaging device) to a specific pixel before it has been moved to it's correct position.

Using this formula every pixel in the original (distorted) fisheye image can be moved to it's undistorted location. This will result in a rectilinear image. Some form of image smoothing such as bilinear interpolation, or other image smoothing technique known in the art may be used to smooth the resulting image so that its appearance is improved.

Alternatively, the start and end drag location, or the single point selected on the uncorrected fisheye image may be converted to their rectilinear equivalent location using the formula without altering the displayed image. This advantageously reducing the computational load, as only one or two locations need to be converted, instead of all pixels comprising the image.

Using the rectilinear equivalent locations, the previously discussed algorithm will then correctly position the remote camera such that the pan and tilt location reflects the degree of movement selected by the user on the remote computational device display. This advantageously allows the use of fisheye lenses with the present invention, allowing a wider field of view.

The present invention also discloses method and apparatus for remotely adjusting the height of an audio- visual assembly. This audio- visual assembly may optionally be affixed to a telepresence robot.

This height adjustment method and apparatus can be implemented simply and at low cost. It is also lightweight.

In the preferred embodiment, two telescoping tubes are used to raise and lower a camera or an audio-visual assembly ("A/V assembly"). In the preferred embodiment, these tubes are constructed from aluminum, although steel, plastic, or other materials may also be used. In the preferred embodiment, these tubes are square in cross- section. In alternative embodiments, round, oval, rectangular, polygon, or C-section cross-sections may also be used. In other alternative embodiments, the fixed tube consists of a tube with sheath attached to its side at or near the top, through which the slidable tube may slide. The sheath is sized such that the slidable tube is firmly held but may still slide up and down within the sheath. Ball bearings may optionally be used to ensure that the sliding action is smooth.

By ensuring sufficient overlap between the tubes when the assembly is at it's maximum height, the entire assembly remains stable at all times, and is not prone to oscillations. Furthermore, the camera or audio- visual assembly itself is kept as lightweight as possible, enabling lightweight telescoping tubes to be used. The slidable telescoping tube moves vertically relative to the fixed tube. In the preferred embodiment, a leadscrew is used to raise and lower the slidable tube relative to the fixed tube. A leadscrew slide is mechanically attached at or near the bottom of the slidable telescoping tube, and rotation of the leadscrew causes the leadscrew slide to travel up or down the leadscrew depending on the direction of rotation of the leadscrew. The leadscrew is driven by electromechanical means known in the art. In the preferred embodiment, an electric DC motor with a gearhead drives the leadscrew. The electric motor is powered by an H-bridge circuit, and the H-bridge circuit is controlled by a computational device. This drive technique may also be used for subsequent embodiments described below.

hi alternative embodiments, a belt-and-pulley mechanism may be used to raise and lower the slidable tube relative to the fixed tube. An upper pulley attaches near the upper end of the fixed tube, and a lower pulley, near the bottom end of the fixed tube, is driven by electromechanical means. A drive belt couples the rotation of the upper and lower pulleys. The slidable tube is mechanically attached to the drive belt, and thus is raised and lowered via rotation of the pulleys. In an alternative embodiment of

the belt-and-pulley mechanism, the upper and lower pulleys are mechanically attached to the slidable tube, and the fixed tube is attached to the drive belt.

In an alternative embodiment of the belt-and-pulley mechanism, sprockets and a drive chain may be substituted for the pulleys and the drive belt.

Another alternative embodiment, a linear gear, mechanically attached to either the inner tube or the outer tube, is used to raise and lower the slidable tube relative to the fixed tube. A circular gear is rotatably attached to the linear gear, and the circular gear is driven by electromechanical means known in the art. The circular gear is mechanically attached to the telescoping tube not attached to the linear gear.

In yet another alternative embodiment, a wheel mechanically attached to the slidable tube creates a friction interface against a wall of the fixed tube and electromechanical rotation of this wheel is used to raise and lower the slidable tube relative to the fixed tube. Alternatively, a wheel mechanically attached to the fixed tube can be used to create a friction interface against the slidable tube, thus electromechanically lowering and raising the sliding tube.

In yet another alternative embodiment, a cord, ribbon, or other flexible material has one end connected to the slidable tube, and the other end connected to a spool at the bottom of the tubes. The flexible material is routed through a pulley at the top of the fixed tube. In this way, a motor can wind the flexible material in order to raise the sliding tube. Running the motor in the other direction, will unwind the spool allowing gravity to lower the sliding tube. Optionally a spring may be used to increase the force with which the sliding tube may be lowered.

In all embodiments of the adjustable height aspect of the invention, a means for routing electrical signals from the upper end of the fixed sliding tube to the lower end of the fixed tube is required to maintain electrical connectivity between the camera or audio- visual assembly and the circuitry that is located at the far end of the fixed tube. When this technique is used with a telepresence robot, the far end of the fixed tube is affixed to a telepresence robot base, and the circuitry includes a computational device

that communicates with the audio-visual assembly.

In the preferred embodiment, a flat flexible cable, known in the art as a "FFC" is used to route the electrical signals. In other embodiments, a ribbon cable, or other flat multi-conductor cable, or a helical multi-strand cable may also be used. The routing of the cable must be designed so as not to interfere with the movement of the sliding tube relative to the fixed tube.

In the preferred embodiment, an FFC, ribbon cable, or other flat multi-conductor cable is routed up the inside of the fixed tube, and is folded over (or creased) 180 degrees at the top of the fixed tube. The cable is then routed down the fixed tube. After the cable passes the bottom of the sliding tube, it folds over, forming a "rolling bend" and is routed into the inside of the sliding tube, and emerges at the top of the sliding tube. The "rolling bend" allows the slidig tube to travel with respect to the fixed tube, without the FFC or flat cable interfering with the operation of the mechanism.

Alternative routing techniques for the flat multi-conductor cable that form a "rolling bend" may also be used. One alternative routing technique consists of an arrangement where the sliding tube slides within a sheath that connects to the fixed tube. The multi-conductor cable is routed so it travels down through the inside of the sliding tube, exits the bottom of the sliding tube, travels downward a distance below the sliding tube, forms the rolling bend, and then travels up the outside of the fixed tube, which is positioned beside the sliding tube. The flat multi-conductor cable then doubles back, routing down the outside of the fixed tube, and then onward to its termination point.

In an alternative embodiment, a helical multi-strand cable is placed within the fixed tube, but below the sliding tube. By constraining the range of motion of the sliding tube such that it does not travel below the space taken by the helical multi-strand cable when it is fully compressed, the sliding tube is able to travel with respect to the fixed tube without the helical multi-strand cable interfering with the operation of the mechanism.

In all embodiments, the height of the stalk (i.e., the height of the combined fixed plus sliding tube lengths) may be controlled by a remote user. Commands for control of the A/V assembly height are sent over the Internet or other data network, and are sent wirelessly to the telepresence robot using a wireless network such as 802.1 Ix or 3G cellular. In the preferred embodiment, the remote user controls the height of the A/V assembly using a graphic user interface that displays buttons labeled "Sit" and "Stand." The "Sit" button causes the stalk height to be lowered to approximately sitting height, and the "Stand" button causes the stalk height to be raised to approximately standing height. The sitting and standing heights may be preset by the user to correspond to their preferred (or actual) heights. The exactly height desired may also be specified user a slider bar, or other graphical input technique known in the art. When the command data is received by the telepresence robot, a computational device on the telepresence robot actuates a motor using techiques known in art, such as stepper motor controllers, PID control loops, and H-bridge circuits. The height of the stalk can be detected using techniques known in the art such as encoder wheels and limit switches.

In summary, the invention consists of a method of controlling a remotely located camera comprising the steps of specifying a horizontal and vertical displacement for an on-screen video image, calculating a pan motor movement and a tilt motor movement amount based on the horizontal displacement and the vertical displacement, and moving a pan motor and a tilt motor by the calculated pan motor movement and the calculated tilt motor movement, wherein the pan motor and the tilt motor mechanically pan and tilt a video camera.

The method may further comprise the step correcting for fisheye distortion of the onscreen video image by using a rectalinearizeing algorithm. Also, the method may be used wherein the remotely located camera is mounted on a mobile telepresence robot The method may also be used wherein the calculated motor movement amount is multiplied by a scaling factor, the scaling factor inversely proportional to the current zoom factor of the video image.

In another embodiment of the invention, a system for controlling a video camera is

described, comprising a video camera, a electromechanically actuated tilt and pan camera mount connected to the video camera, a local computational device, sending video data accepted from the video camera, receiving transmitted movement commands, and controlling the electromechanically actuated tilt and pan camera mount in response to the transmitted movement commands, a remote computational device, processing video data received from the local computational device, and sending movement commands to the local computational device in response to a horizontal and vertical image displacement specifier event, wherein the remote computational device sends movement commands in response to the screen drag event, and a remote display device, displaying video data processed by the remote computational device, and updating the video image location in response to horizontal and vertical image displacement specifier events.

The system may optionally attach the electromechanically actuated tilt and pan , camera mount to a mobile telepresence robot. Also, the system may include an image rectilinearlizing device, rectilinearizing the video data generated by the video camera.

In another embodiment of the invention, a method of controlling a remotely located camera is described, comprising the steps of selecting a location within an image generated by a camera, computing Cartesian coordinates corresponding to the selected location, moving a remotely located camera by a longitudinal angle computed from the Cartesian coordinates using inverse gnomonic projection equations, and moving the remotely located camera by a latitudinal angle computed from the Cartesian coordinates using inverse gnomonic projection equations.

The method may also apply where the remotely located camera is an element of a mobile telepresence robot.

In another embodiment of the invention, system for controlling a remotely located camera is described, comprising a two-dimensional coordinate selector, selecting a two-dimensional coordinate from within an image generated by the remotely located camera, a motor-drive, driving the remotely located camera, a local computational device controlling the motor drive, a communications link interconnecting the local computational device with a remote computational device, and a gnomonic

calculation calculator executing on a computational device selected from the group consisting of the local computational device and the remote computational device, wherein the gnomonic calculation calculator calculates a longitudinal angle and a latitudinal angle using an inverse gnomonic projection calculation and the two- dimensional coordinate, the longitudinal angle and latitudinal angle used to control the motor-drive.

In another embodiment of the invention, an apparatus for controlling the height of an audio-visual assembly is disclosed, comprising an audio-visual assembly comprising a camera and speaker, a sliding tube, mechanically attached to the audio-visual assembly, a fixed tube, slideably attached to the sliding tube, and a multi-conductor routed from the bottom of the fixed tube to the top of the sliding tube, whereby the sliding tube may raise and lower with respect to the fixed tube while maintaining electrical connectivity.

The apparatus may also include a variant wherein the fixed tube is mounted on a robotic mobile base unit.

In another variant of the apparatus, the sliding tube is rotatably attached to leadscrew, and the leadscrew is driven by an electromechanical assembly, and the electromechanical assembly is attached to the fixed tube.

In another variant of the apparatus, the sliding tube is mechanically attached to a drive belt, and the drive belt is driven by an upper pulley and a lower pulley; and the upper pulley and lower pulley are attached to the fixed tube, and one of the pulleys is rotatably driven by an electromechanical assembly.

In another variant of the apparatus, a flat cable is routed inside the bottom of fixed tube to the top of the fixed tube; and the flat cable is creased 180 degrees at the top of the fixed tube, and the flat cable is routed from the crease down the inside of the fixed tube, and the flat cable turns from the inside of the fixed tube into the inside of the sliding tube, and the flat cable exits from the top of the sliding tube.

In another variant of the apparatus, one end of a helical multi-stranded cable is affixed near the bottom of the fixed tube, and the opposite end of a helical multi-stranded cable is affixed near the bottom of the sliding tube, whereby the flexibility inherent in the helical multi-stranded cable affords a continuous electrical connection while the sliding tube travels relative to the fixed tube.

DETAILED DESCRIPTION OF THE DRAWINGS

FIG. 1 is a exemplary embodiment of the invention.

FIG. 2 is a diagram illustrating a user controlling the device.

FIG. 3 is an exemplary embodiment of the invention used in a telepresence system

FIG. 4 is a cross-sectional diagram illustrating the pulley embodiment of the height adjustment mechanism when fully lowered.

FIG.5 is a cross-sectional diagram illustrating the linear gear embodiment of the height adjustment mechanism when partially raised.

FIG. 6 is an exemplary embodiment of the lead-screw height adjustment aspect of the invention.

FTG.7 is a cross-sectional diagram showing the helical cable and friction interface embodiments of the height adjustment mechanism when partially raised.

FIG. 8 is an exemplary embodiment of a pan and tilt audio- visual assembly.

DETAILED DESCRIPTION OF THE INVENTION

The present invention is a new and improved method and apparatus for panning and tilting a remotely located camera in order to select a view. The invention also discloses a means for adjusting the height of the remotely located camera.

FIG. 1 is a exemplary embodiment of the invention. A video camera 101 is capturing video of a subject 104. The video camera may be electro-mechanically positioned to a different tilt angle via a motorized tilt mechanism 102. It may also be electro- mechanically positioned to a different pan angle via a motorized pan mechanism 103. The motorized pan and tilt mechanisms are mounted to a rigid mount point, here a tripod 108, such that the video camera is at the desired height. The video image data is sent to a video image sink 105, such as a computer, or a video data processor. The video image sink converts the video image data into a video data format that can be sent over a digital information conduit 106. In the preferred embodiment, the digital information conduit is the Internet, but other digital data transmission schemes such as ad hoc wireless networks, point-to-point data links, satellite uplinks, or other digital transmission schemes known in the art may also be used. In the preferred embodiment the video data format is ITU-T H.264, but other video data formats such as H.261, H.263, or other video compression algorithms known in the art may also be used. The digital information conduit sends the video image data to a remote computational device 107. In the preferred embodiment, the remote computation device is a personal computer such as a laptop or desktop computer, but other remote computational devices such as personal digital assistants, cell phones, or video game consoles may also be used. An input device, here a mouse 109, is also part of the invention. The remote computational device converts the incoming video data format into a format that can be displayed on a viewing panel 110. In the preferred embodiment, the viewing panel is an LCD display, but other display technologies known in the art may also be used. The resulting video image 111 is then displayed on the viewing panel in real-time.

FIG. 2 is a diagram illustrating a user controlling the device. FIG. 2(a), 2(b), and 2(c) illustrate the video camera and related subsystems at three different times, t=0, t=l,

and t=2. FIG 2(d), 2(e), and 2(f) illustrate the remote computational device and viewing panel at these same times, t=0, t=l, and t=2. Beginning at time t=0, a video display window 201 contains a video image 202. The video image is shown being dragged 203 towards the lower right corner of the video display window. A "drag" refers to a movement made with a mouse, trackball, touch pad, or other input device while an associated button is being pressed which results in the subject of the drag being moved across a display surface. Concurrently with the drag action, movement commands are sent from the remote computation device to a video camera with motorized pan and tilt axes 208 , directing the remote video camera to reposition itself such that the camera will rotate downward relative to the remote subject 211 by an angle equivalent to the motion of the video image. Note that an upward drag motion causes a downward camera motion because dragging the video image upward creates a space underneath the screen video image which must be filled with video data taken from a location below the video camera's lower imaging window boundary. Note that the remote video camera 208 has not yet responded to the drag, because the movement commands that have been sent from the remote computational device have not yet reached the pan and tilt motors of video camera. Moving to time t=l , the video image is shown in its post-drag position 204. Notice that the drag has resulted in empty space 205 in the region that used to be occupied by the video image. This empty space occurs because the remote camera has not yet moved into a position that enables it to capture the video data that would otherwise fill the empty space. In the preferred embodiment, the empty space is represented by black pixels on the display device. In alternative embodiments, a visualization of a solid surface may be used, creating the illusion that the missing video data is occluded by a foreground object. Concurrently with the completion of the drag, the first movement commands have reached the remote video camera 209, which has begun to move away from the subject 212. The video camera system sends information back to the remote computational device informing the remote computational device of its new position. At time t=Q., the video image 206 has begun to reflect the movement of the camera. In synchronization with the movement of the camera 210, the video image is moved towards its original position, reducing the amount of empty space 207 shown in the frame. Because the video image is moving in an equal and opposite amount with respect to the movement of the camera, the user perceives that the video image 206

has stayed in the same position that it was moved to after the initial drag action. However, the actual subject 214 has shifted further away from the center of the field of view as the camera moves towards the final position specified by the initial drag event.

The precise amount that the remote angle must be moved can be calculated using a knowledge of the field of view of the remote camera. For example, a drag motion over a distance equal to half the height of the screen will result in a camera motion equal to half of the field of view of the screen. In an alternative embodiment, the video image is displayed in a window on the screen, and the drag motion should then be calculated as a percentage relative to the size of the window, not relative to the entire screen.

FIG.3 is a exemplary embodiment of the invention used in a telepresence system or a mobile telepresence robot. This embodiment of the invention incorporates by reference copending application 11/223,675, filed September 9, 2005. Matter essential to the understanding of the present application is contained therein. A video camera 301 and screen 302 are mounted in a audio-visual assembly 304. This entire assembly tilts as a unit using an electromechanical tilt mechanism 303 that is rotatably connected to a fork and stalk assembly 305 . The fork and stalk assembly mechanically connects the audio-visual assembly to a mobile base 306. The mobile base can be rotated in place, which results in camera rotation. Additionally, the mobile base supports translation, such that the telepresence system can be moved through its environment by being driven along the ground via commands that are sent over a digital information network from a remote terminal. A wireless data router 307 communicates with the telepresence system wirelessly and also routes data to and from the Internet 308. The system is otherwise equivalent in function to the system shown in Figures 1 and 2. In particular, dragging of the video image 309 displayed on a display 310 of a remote computational device, here a personal computer 311, results in rotation of the mobile base and/or tilting of the camera above or below the horizon by an amount proportional to the drag motion, and in a direction determined by the drag motion.

FIG.4 is a cross-sectional diagram illustrating the pulley embodiment of the height adjustment mechanism when partially lowered. An audio- visual assembly 401 rests atop a sliding tube 402. In an alternative embodiment, a camera may be substituted for the audio- visual assembly. The sliding tube slides within a fixed tube 403. Here, the fixed tube is a sheath that surrounds the sliding tube and allows it to slide up and down, and also includes an attached vertical upright. In an alternative embodiment, the fixed tube may also completely encase the sliding tube. An electromechanical drive mechanism, here an electric gearhead motor 404, rotatably drives a lower pulley 405. The electrical gearhead motor is attached to the fixed tube. The lower pulley drives a drive belt 406. In the preferred embodiment, an 0-ring style drive belt is used, but other drive belts known in the art, or a sprocket and chain assembly may also be used. Movement of the drive belt rotates an upper pulley 407. The upper pulley is attached to the fixed tube. The drive belt is kept taut by appropriate placement of the upper and lower pulleys. A drive belt retainer 408 attaches the drive belt to the sliding tube. Movement of the gearhead motor with thus cause movement of the lower pulley, which will cause linear motion of the drive belt, and thereby raise or lower the sliding tube. A flat flexible cable (FFC) 409 is shown with a rolling bend, allowing power and signal to travel up to the audio-visual assembly.

FIG.5 is a cross-sectional diagram illustrating the linear gear embodiment of the height adjustment mechanism when partially raised. An audio- visual assembly 501 rests atop an upper telescoping tube 502. hi an alternative embodiment, a camera may be substituted for the audio- visual assembly. The sliding tube slides within a fixed tube 503. The fixed tube connects to a base 509. In an alternative embodiment, this base is a telepresence robot base. An electromechanical drive mechanism, here an electric gearhead motor 504, rotatably drives a circular gear 505. The electric gearhead motor is attached to the sliding tube. The circular gear movably meshes with a linear gear 506, such that the circular gear moves linearly along the linear gear. A flat multi-conductor cable 507, is mounted within the sliding tube. At the bottom of the sliding tube, the flat multi-conductor cable extends downwards, and then forms a rolling bend 508, and travels up the fixed tube. The multi-connector cable then doubles over, traveling down the fixed tube and enabling an electrical connection between the audio-visual assembly and the bottom of the fixed tube while also

permitting unencumbered travel of the sliding tube with respect to the fixed tube.

FIG. 6 is an exemplary embodiment of the lead-screw height adjustment aspect of the invention. For clarity, only the internals of the height adjustment mechanism are shown. Use with a camera, base, audio-visual assembly, and telepresence robot base is intended, as with previous embodiments of the invention. A sliding tube 601 slides within a fixed tube 602. An electromechanical mechanism, here a motor with a gearhead 603, is mechanically attached to a leadscrew 604. The leadscrew is rotatably connected to a leadscrew movement block 605, which slides down as the leadscrew turns clockwise and slides up as the leadscrew turns counterclockwise. The leadscrew movement block is mechanically attached to the sliding tube. A flat multi-strand cable, here a flat flexible cable or "FFC, enters the fixed tube 606. It runs up the length of the fixed tube until it reaches the top of the tube, where it folds over onto itself 607. The FFC then is routed down the tube until it forms a rolling bend 610. The FFC then routes into the sliding tube 601. Via this routing sequence, the sliding tube is able to move up and down relative to the fixed tube while maintaining constant electrical connectivity using the FFC. The FFCs rolling bend will travel up and down as the sliding tube moves, and the FFC is thereby prevented from jamming or wrinkling during use.

FIG.7 is a cross-sectional diagram showing the helical cable and friction interface embodiments of the height adjustment mechanism when partially raised. An audiovisual assembly 701 rests atop an sliding tube 702. hi an alternative embodiment, a camera may be substituted for the audio-visual assembly. The sliding tube slides within a fixed tube 703. An electromechanical drive mechanism, here an electric gearhead motor 704, rotatably drives a friction wheel 705. In the preferred embodiment, the friction wheel is a rubber wheel, however in alternative embodiments, any rotating wheel capable of engaging a flat surface via friction may be used. The electromechanical drive mechanism is attached to the fixed tube. The friction wheel rotatably drives the sliding tube, such that the sliding tube will move up or down in response to rotation of the friction wheel. Thus rotation of the electromechanical drive mechanism cause the sliding tube to raise or lower. A helically wound multi-strand conductor 706 is shown adjacent to the fixed tube. The

conductor winds up the fixed tube in a helical fashion, and then enters the sliding tube. Within the sliding tube, the multi-strand conductor need no longer follow the shape of a helix. In an alternative embodiment, the multi-stranded conductor may also remain helix-shaped within the sliding tube. Use of the helically wound conductors allows electrical connectivity between the audio-visual assembly and the bottom of the fixed tube without obstructing travel of the sliding tube relative to the fixed telescoping tube.

FIG. 8 is an exemplary embodiment of a pan and tilt audio- visual assembly. A bezel 801 contains both an LCD panel 802 and a camera 803, that move as a unit. In the preferred embodiment, a speaker and microphone are also present in the audio- visual assembly. A servo motor 804 and a mechanical swivel 805 are mounted along an imaginary axis at or near the center of mass for the audio- visual assembly. Specifically, the servo is mounted to the left fork upright 806, and the servo motor shaft and optional horn 808 is rigidly attached to the audio-visual assembly. In the preferred embodiment, the servo motor is attached to the left fork upright, but the servo motor may also be located inside the audio-visual assembly, with the rotating shaft and optional horn of the servo rigidly attached to the fork upright. The mechanical swivel is rigidly attached to the audio-visual assembly and rotatably attached to the left fork upright 807. In an alternative embodiment the mechanical swivel may be rotatably attached to the audio- visual assembly, and rigidly attached to the left fork upright. The mechanical swivel is shown on the right fork upright and servo motor is shown on the left fork upright, but this may be reversed in alternative embodiments.

This pan and tilt audio- visual assembly advantageously reduces the torque that needs to be exerted by the servo motor when rotating the audio-visual assembly up or down, without requiring a counter- weight. By eliminating the need for a counter- weight, the overall weight of the audio- visual assembly can be kept low. Additionally, the fork- shaped assembly allows the camera to look at the area behind itself when it is angled down sufficiently.

Advantages

What has been described is a new and improved method and apparatus for panning and tilting a remotely located camera in order to select a view. A means for adjusting the height of the remotely located camera has also been disclosed.

The method and apparatus allows a user to drag an image on a video display screen by an amount that corresponds to the angle and direction of the rotation and tilt movement that is desired. Because the motion of the camera is proportional to the distance of the drag event, the interface gives the user an intuitive sense for how far the remote camera will move before it even begins it's motion and without any prior experience with the device. By filling in the empty space in the screen or frame as the camera moves into the new position, immediate feedback is given as to the camera's motion, but the user also perceives that the system responds instantaneously to his movement requests.

The height adjustment method and apparatus allows a user to adjust the height of a camera, audio- visual assembly, or telepresence robot audio-visual assembly from a remote location. Furthermore, the apparatus can be implemented in a low cost, lightweight manner.

The pan and tilt audio-visual assembly advantageously reduces the torque that needs to be exerted by the servo motor when rotating the audio-visual assembly up or down, without requiring a counter- weight, reducing the weight of the audio- visual assembly. Additionally, the fork-shaped assembly allows the camera to look at the area behind itself when it is angled down sufficiently.

While certain exemplary embodiments have been described in detail and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention is not to be limited to the specific arrangements and constructions shown and described, since various other modifications may occur to those with ordinary skill in the art.