Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
MACRO IMAGE STABILIZATION METHOD, SYSTEM AND DEVICES
Document Type and Number:
WIPO Patent Application WO/2017/112800
Kind Code:
A1
Abstract:
An image recording device having a stabilization mechanism for generating, storing and transmitting a live video stream, the device including a movement sensor that provides movement data of the device and an image sensor coupled or associated with a hardware processor configured with software containing instructions to process movement information of the motion sensor and image data from said image sensor, and generate a virtual camera that has a synthesized focus of attention, where the device uses frames or frame portions of the virtual camera or cameras generated in combination with the image data to provide an adjusted video stream of motion corrected video.

Inventors:
WATOLA DAVID (US)
OLSON ERLEND (US)
Application Number:
PCT/US2016/068093
Publication Date:
June 29, 2017
Filing Date:
December 21, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MOBILE VIDEO CORP (US)
International Classes:
H04N5/232
Foreign References:
US20150163408A12015-06-11
US20150254825A12015-09-10
Attorney, Agent or Firm:
BONINI, Frank, J., Jr. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. An image recording device comprising:

a) a stabilization mechanism comprising at least one movement sensor for sensing movement and providing movement data;

b) an image sensor disposed to receive an image thereon;

c) a hardware processor configured with software containing instructions to process movement information comprising movement data from said motion sensor and image data from said image sensor;

d) wherein said device is configured to capture video frames;

e) wherein said device is configured to identify changes in position between successive frame captures from information provided from said movement sensor; said changes in position comprising a first delta, said first delta comprising a position change between a first position and a second position, said first position corresponding with a first frame, and said second position corresponding with a second frame;

f) said device having a lens with a corresponding focus of attention and a field of view; g) at least one virtual camera synthesized from said information, said at least one virtual camera having a first virtual camera focus of attention and a first virtual camera field of view; h) said device being configured with instructions for instructing the processor to carry out an evaluation of the movement information and determine whether the movement meets a threshold that requires a corrective adjustment;

i) said device being configured to provide an adjusted video stream of video comprising one or more frames or frame portions from said first virtual camera and having said first virtual camera focus of attention for those said one or more frames or frame portions.

2. The device of claim 1 said device being configured to said device being configured to provide an adjusted video stream of video comprising one or more frames or frame portions from said first virtual camera and having said first virtual camera field of view for those said one or more frames or frame portions.

3. The device of claim 1, wherein said device movement sensor comprises an IMU.

4. The device of claim 3, said IMU having three axes.

5. The device of claim 3, said IMU having six degrees of freedom.

6. The device of claim 1, wherein a plurality of virtual cameras are synthesized from said information, said plurality of virtual cameras comprising said at least one virtual camera having a first virtual camera focus of attention and a first virtual camera field of view, and a plurality of virtual cameras, each of said virtual cameras having a respective virtual camera focus of attention and respective virtual camera field of view.

7. The device of claim 6, said device being configured to provide an adjusted video stream of video comprising one or more frames generated from said plurality of virtual cameras and having one or more of said virtual camera focus of attention for those said one or more frames.

8. The device of claim 6, said device being configured to provide an adjusted video stream of video comprising one or more frames generated from said plurality of virtual cameras and having one or more of said virtual camera focus of attention for those said one or more frames generated from at least one of said respective virtual cameras and having a respective virtual camera field of view for those said one or more frames generated from said respective virtual camera.

9. The device of claim 1, wherein said adjusted video stream comprises stabilized video.

10. The device of claim 9, wherein said adjusted video stream comprises video that is adjusted to compensate for distortion effects of the camera or lens.

11. A method of generating a motion stabilized video stream from a camera subject to movement, the method comprising:

a) providing a camera having a lens and an image sensor, said image sensor comprising a field of pixels;

b) designating a look direction for said camera, said look direction comprising a first focus of attention corresponding with a first field of view, said first focus of attention

corresponding with a first camera position;

c) said camera being associated with at least one IMU that provides information about the camera movements;

d) wherein the camera movement produces a movement of said first focus of attention to a second focus of attention corresponding with a second field of view, said second focus of attention corresponding with a second camera position; e) capturing a video with said camera, wherein said camera captures at least a portion of video from said first focus of attention;

f) capturing video with said camera, wherein said camera has moved from said first position to said second position;

g) synthesizing from said captured video and said camera movement information a synthetic camera having a synthesized focus of attention corresponding with said first focus of attention; and

h) generating a video stream having a focus of attention that is said first focus of attention, wherein said video stream having said focus of attention that is said first focus of attention includes at least some portion captured by said camera at said first position and at least some other portion captured by said camera at said second position, wherein said video portion captured by said camera at said second position is manipulated to produce said video portion having said first point of view.

12. The method of claim 11 ,

wherein said camera is moved from said first position to one or more subsequent positions, and wherein video is captured by said camera at each of said one or more subsequent positions, wherein said first position has a first focus of attention representing a point of view, and wherein each of said one or more subsequent positions has a respective focus of attention representing a corresponding respective point of view; and

wherein generating a video stream having a focus of attention that is said first focus of attention comprises generating said video stream from said

video captured by said camera at said first position and from said video captured at said one or more subsequent positions.

13. The method of claim 12, wherein generating a video stream having a focus of attention that is said first focus of attention comprises generating said video stream from said video captured by said camera at said one or more subsequent positions and synthesizing a camera having a focus of attention corresponding to the first focus of attention for each of the said one or more subsequent positions at which video is captured by said camera.

14. The method of claim 13, wherein synthesizing a camera comprises manipulating the image information and position information captured at each position of said camera to include a first manipulation corresponding to an angular adjustment, and at least one second manipulation corresponding to a relationship adjustment.

15. The method of claim 14, wherein said first manipulation manipulates said pixels of said field of pixels to generate a synthesized field of view corresponding to the first focus of attention of the first camera position.

16. The method of claim 11, wherein said camera has a fisheye lens, and wherein said video is captured through said fisheye lens.

17. The method of claim 16, including manipulating said image information to adjust said image information for extrinsic and intrinsic corrections.

18. The method of claim 16, including cropping one or more portions of said video comprising said video stream.

19. The method of claim 11, wherein video is captured from a plurality of cameras, each said plurality of cameras providing a plurality of respective corresponding viewpoints, and wherein a plurality of synthetic cameras are produced from said plurality of cameras.

20. A method of generating a video stream from a camera, wherein said video stream is captured from a camera at a first location representing a first point of view, and wherein said video stream is generated to display the video as if taken from an alternate point of view, the method comprising:

a) providing a camera having a lens and an image sensor, said image sensor comprising a field of pixels;

b) capturing video with said camera, said video being captured from a first look direction comprising a first focus of attention corresponding with a first field of view, said first focus of attention corresponding with a first camera position;

c) synthesizing from said captured video a synthetic camera having a synthesized focus of attention corresponding with said alternate point of view; and

d) displaying a video stream having a focus of attention that corresponds with said alternate point of view.

Description:
MACRO IMAGE STABILIZATION METHOD, SYSTEM AND DEVICES

BACKGROUND OF THE INVENTION

1. Field of the Invention

[0001] The field of the invention relates to image stabilization for image capturing devices, such as cameras that are in motion, and further relates to and provides methods, systems and devices that produce motion-stabilized video streams where the camera itself is subject to conditions of agitation or movement. The invention further relates to synthesizing one or more new cameras from information captured by a physical camera, including synthesis of a new camera to produce scene images with different points of view than the physical camera.

2. Brief Description of the Related Art

[0002] Cameras and even personal digital assistants, as well as smartphones that have recording capabilities, have attempted to utilize motion sensors as a way to adjust an image. For example, some have employed a motion sensor for feedback so as to tilt the image to right the scene as if the camera were not tilted, and have even cropped the image so any tilt correction is able to fill the image frame when viewed. However, these methods are not workable for live video streams where a camera is shaking, and where the shaking occurs by a number of different motions. In some prior instances, a fisheye lens has been used as a way to expand the field of view for the images being captured. However, although the wide field of view offers an opportunity to capture events taking place within a broad field area, when the camera experiences shaking, there is further distortion of the image as a result of the camera being agitated (e.g., flopping around and/or bouncing). Although attempts have been made, typically using software, to "flatten" a fisheye view, there is the detriment of losing a portion of the scene, where the point of interest is located at the outer boundary of the fisheye field of view. Flattening a fisheye image generally involves substantial cropping, which decreases the field of view. Some software has attempted to provide a remapping algorithm for stored photographs that is designed to correct distortion along a single axis. While curved-appearing vertical objects may benefit from such a correction, there typically are fisheye curvatures that remain in other objects and lines appearing in the image. In addition, unwarping a fisheye lens captured image typically results in the image being degraded near the edge, such as, a loss of resolution. Movement of imaging devices, such as cameras, typically move as a unit, so the movement of the camera changes the direction from which the image is being captured. For intentional movement of a camera, as in the case where the camera (or an operator using it) is following a target in motion, the movement may be desirable. However, when the camera is agitated due to movement, such as, for example, when an individual using the camera moves to chase something, there may be undesirable agitation (e.g., from running) that changes the camera image direction. The captured images or video, in particular where undesired camera motion has taken place, therefore, may be indiscernible or unusable.

SUMMARY OF THE INVENTION

[0003] A macro image stabilization system for video, and in particular to maintain a look of natural imaging for a video stream that is taken with an image recording device, such as, a camera that is undergoing movement or agitation. The video image is recorded while the camera is shaking, and a live stream of video, recorded video, or both, is produced having a stabilized look as if the camera were stationary or maintained substantially in a stationary position, or traveling on a substantially smoother or otherwise more desirable trajectory. Undesired camera movement, though occurring, is minimized or eliminated in the video produced using the devices, systems and methods of the invention. The stabilization mechanism preferably may image video of subjects which themselves also may be in motion. Stabilized images also may be generated where the camera is moving relative to the subject or along with the subject.

[0004] Preferred embodiments provide a wide angle viewing field, generate one or more synthetic cameras, implement stabilization, and synthesize multiple views. The plurality of synthesized views are manipulated to provide a stabilized video stream comprising the multiple plurality of synthetic camera streams or portions thereof. The synthesized views or portions thereof are assembled together while the camera continues capturing the scene to provide motion stabilized video output.

[0005] The devices preferably are configured to generate a video stream of live video of events taking place, or that have occurred, where the video stream output is motion-stabilized video. The devices, system and methods include replay aspects, which may involve potentially changing the viewpoint or producing a different video product well after the fact (i.e., after the event captured has taken place). The method, system and devices are configured to manipulate the information ascertained by the device and its components, which include a number of sensors, to output a motion stabilized live video stream that may be transmitted from the device to another location or other hardware component (computer, device, etc.) through a communication network (e.g., cellular, Wi-Fi, satellite, or other suitable communication transmission medium).

[0006] The system, method and devices may be implemented in conjunction with a number of circumstances, including law enforcement and first responder type activities, as well as sporting events and other activities. Implementation of the invention may be made in connection with recording of sports activities, as well as by participants in a sport, such as, playing soccer, kayaking, automobile racing, cycling, running, and the like. According to some embodiments, the system, method and devices may be configured for use in connection with law enforcement activities. Some preferred embodiments may be provides devices configured as law enforcement body cameras to be worn by an individual, or cameras that are to be supported on or in a vehicle (e.g., a police vehicle, car, helicopter, cycle, or bike). Embodiments of the devices and system, such as a law enforcement camera, may include transmission capabilities for transmitting video, including live streaming stabilized video of a scene, or target object being pursued. According to some implementations, the devices may comprise cameras that are part of a system that includes a remote command center or component that can control the device and its functions. For example, the device synthesized viewpoint may be remotely selected or designated (e.g., so that a scene or target object may be viewed as if being imaged from a different position).

[0007] According to one aspect of the invention, a fisheye lens is utilized in connection with an image capture mechanism (such as an image sensor for recording the image). The fisheye lens, according to some embodiments, may be placed over a standard camera lens. A removably positionable fisheye lens attachment may be provided. According to other embodiments, a fisheye lens is provided to act as the primary lens for the image recording device or camera, and in yet other embodiments, the fisheye lens may be provided in conjunction with one or more other lenses (e.g. , a narrow or standard field lens) to yield alternate views. Embodiments of the device and system may further include a plurality of image capture components, such as, for example, a plurality of lenses, which are arranged to capture images from a respective plurality of directions. According to preferred embodiments, fish eye lenses are positioned over one or more conventional or standard lenses, or one or more of the lenses may be fisheye lenses, and one or more a standard lens (or a telephoto or zoom lens), or combinations of these.

[0008] According to preferred embodiments, the system, method and devices generate, from a single camera, one or more synthetic cameras that provide imaging, including live streaming video, that otherwise would not be obtainable from a single conventional camera. Other embodiments may employ a plurality of physical cameras or camera components for a plurality of points of view, from which one or more video streams may be generated that otherwise would not be obtainable from a single conventional camera, or from the plurality of cameras. For example, a plurality of synthetic cameras are generated from one physical camera, or from a plurality of physical cameras, thereby providing a plurality of video streams from a respective plurality of points of view. A first plurality of video streams may be provided from the first plurality of synthetic cameras generated from images from a first camera, and a second (or more) plurality of video streams may be provided from a second plurality of synthetic cameras generated from images from the second camera. [0009] The system, method and devices generate and stream video that is adjusted video to depict a point of view that includes a target object or object of interest. This may be done by fixing the camera focus of attention to a designated direction or point in space. Embodiments also may designate the focus of attention based on an event occurrence (which may be detected or designated).

[0010] Embodiments generate a focus of attention to direct the video imaging (or image frames) to the desired virtual vantage point. The focus of attention may be generated using information from the sensors (including the image, motion and other sensors) that is processed to determine the instantaneous viewpoint for each of the one or more virtual cameras. The imaging information, including processed imaging information, may be further manipulated to provide video (including streaming video) which may contain image information about a scene from any of the cameras or virtual cameras synthesized therefrom.

[0011] The system, method and devices preferably include a stabilization mechanism.

According to some preferred embodiments, the stabilization mechanism generates and assembles the virtual image streams in connection with the designated look point or focus of attention to produce what is an expected view as if the camera were looking from the physical direction (the designated look direction), even though the camera has moved to image from a different or alternate look direction.

[0012] Embodiments of the stabilization mechanism comprise or utilize the sensors of the device, including position and orientation information, such as, for example, data from an inertial measurement unit (IMU) (e.g., IMU data). The systems and devices produce stabilized video, which may be live streaming video that may be transmitted, stored, or both. The stabilized video is generated through manipulation of the image and sensor information. According to preferred embodiments, the camera image sensor and other sensors of the device are associated with the device capture information. The captured information is associated with a timestamp. The image information may be manipulated by applying one or more corrections or adjustments to the image for example, to make intrinsic corrections, extrinsic corrections or both.

[0013] According to preferred embodiments, the stabilization mechanism is configured to generate one, and preferably a plurality of virtual cameras from the contributing sources, such as data ascertained from the device sensors and associated components. The virtual cameras preferably have or are assigned a focus of attention. According to some preferred embodiments, the focus of attention is a preferred look direction or target object to be tracked, and may be selected or determined from sensor information, an instruction or direction or other information source, including, for example, video content captured by a camera (e.g., its image sensor). The stabilization mechanism images a target scene, and according to preferred embodiments, an object in the scene, where the camera or other imaging device that is capturing the scene image is shaking or being agitated in some manner. A viewpoint is synthesized for one or more, and preferably for a corresponding plurality of cameras, which are synthetic cameras. The synthetic camera viewpoints provide the stabilization mechanism with a plurality of video streams, which are new video streams. Manipulation of the synthesized video stream preferably is done to provide a stabilized video.

[0014] Preferred embodiments of the method, system and devices provide a video product that is stabilized and improved video from one or more cameras. Preferred embodiments provide stabilized video that is live streaming video, stored recorded video, or both, from a plurality of viewpoint synthesized virtual cameras, as well as one or more physical cameras, or combinations thereof. The video stream created preferably is generated from the physical camera image information, as well as the one or more virtual camera streams generated from manipulations of the physical camera data.

[0015] The system, method and devices may be configured to permit motion that is intended motion, such as, for example, movement of the camera on a person or supporting object, where a change in direction is made and determined to be an intended movement as opposed to a movement for which motion correction is implemented.

[0016] The invention has utility in a number of applications, and in particular where camera movement occurs and where undesirable camera movement would compromise the image being captured. Some particular applications for the stabilization mechanisms shown and described herein include law enforcement, security, sports, education, health and elder care industries. For example, police body cameras are often used to record events as they are taking place, and although the cameras may capture a significant amount of activity, there are typically voids in the video capture, which are unusable portions. In many instances the voids or unusable portions coincide with the occurrence of the most important or interest generating events. In a typical situation, the police body camera user, a law enforcement officer, may be called upon to initiate a chase, on foot. The camera therefore may shake with each step and movement of the officer. The portion of the video during that chase time may be unusable because the image is not captured in a manner that provides suitable detail. For example, the use of the video as evidentiary value, such as, in a subsequent trial or hearing, or even to see whether there is an accomplice assisting the fleeing suspect, or whether the suspect disposed of an item (e.g. , such as a weapon or stolen item), may have no value at all or very limited value. The present invention is designed to minimize or eliminate the undesirable effects of motion due to camera agitation so usable, motion stabilized video may be captured, and streamed from the device in a stabilized form. [0017] The invention provides a system, method and devices for producing high resolution and highly detailed video images captured from a camera that is itself in motion. The present system, method and devices implement a stabilization mechanism to capture and produce images, including live, streaming video that has high resolution and high detail and which is obtained from a camera that is itself in motion. The stabilized video images or stream may be transmitted from the camera device for viewing at a location remote from the camera device. According to some embodiments, the method, system and device may be configured to remotely control a synthesized viewpoint. The control of a synthesized viewpoint may be implemented in connection with a remote component that communicates with the camera device, such as a command server. This may be utilized in connection with law enforcement activities.

[0018] According to some preferred embodiments, the imaging may take place where the resolution is high. For example, embodiments of a camera with the motion stabilization mechanism may be configured to run at a high definition or ultrahigh definition {e.g.,

approximately 3.5 K or 4K), and the resultant image produced may be at least as high as 720p. The camera image sensor may be configured to have a higher resolution than the video product output.

[0019] It is an object of the invention to provide a system, method and devices for electronic stabilization of the images and video captured by a mobile capture device that is undergoing motion which includes extreme turbulence. The stabilization system, method and device may be implemented in conjunction with a mobile camera that captures and streams live video.

[0020] Other objects of the invention may provide a system and method and a plurality of mobile video cameras that are configured with associated motion-tracking sensors that track the camera movements and processing components with the purpose of synthesizing new video sequences as they would have been captured by a virtual camera with programmable location, motion, position/orientation, and characteristics.

[0021] It is another object of the invention to provide a method, system and device for generating stabilized video captured from a moving camera, where the system and devices are configured to provide stabilization for imaging in conjunction with low power, bidirectional wireless communication, or, lessening the computing horsepower required for image manipulation, and reducing or eliminating the need for multiple simultaneous cameras, and asymmetric

communications .

BRIEF DESCRIPTION OF THE DRAWING FIGURES

[0022] Fig. 1 is an exemplary embodiment schematically illustrating a device configured in accordance with the invention. [0023] Fig. 2 is a schematic flow diagram illustrating a preferred embodiment of image stabilization generation processes in accordance with the invention implemented in conjunction with hardware processing components.

[0024] Fig. 3 A is a schematic illustration of a camera and subject geometry for imaging a simple scene using an idealized "pinhole" camera.

[0025] Fig. 3B is a schematic illustration representing the resulting image of the scene depicted in Fig. 3 A.

[0026] Fig. 4 A is a schematic illustration of the camera and subject geometry for imaging the simple scene, as in Fig. 3 A, using the idealized "pinhole" camera of Fig. 3 A, but with the camera being aimed along a new optical axis OA' .

[0027] Fig. 4B is a schematic illustration representing the resulting image of the scene depicted in Fig. 4A.

[0028] Fig. 5 is a schematic flow diagram illustrating a preferred embodiment of image stabilization generation processes in accordance with the invention implemented in conjunction with hardware processing components, for a device configured in accordance with the invention as a single camera.

[0029] Fig. 6 is an exemplary embodiment of a device configured as a mobile body camera with a stabilizing mechanism for generating and transmitting real-time live stabilized video of a scene.

[0030] Fig. 7A depicts a standard image of a scene taken from a frame of a video imaged using a standard video camera, with the corresponding motion information depicted in connection with the image, where the motion information represents the change in orientation with respect to the desired target orientation.

[0031] Fig. 7B depicts an image of the scene in Fig. 7A taken with an image capture device configured in accordance with the present invention, to produce stabilized video, also depicted with the corresponding motion information.

[0032] Fig. 8 A depicts a standard image of the scene of Fig. 7A, taken from another frame of the video imaged using the standard video camera, with the corresponding motion information depicted in connection with the image, showing movement different than that of Fig. 7A.

[0033] Fig. 8B depicts an image of the scene in Fig. 8 A from a frame of the video taken with the image capture device configured in accordance with the present invention to produce stabilized video, also depicted with the corresponding motion information.

[0034] Fig. 9A depicts an image of a scene taken from a frame of a video imaged using a standard video camera that exhibits the extrinsic distortion artifact caused by motion of a rolling- shutter camera. [0035] Fig. 9B depicts an image of a scene taken from a frame of a video imaged using a device according to the invention configured to generate enhanced video where the extrinsic rolling- shutter distortion due to camera motion has been corrected.

DETAILED DESCRIPTION OF THE INVENTION

[0036] Referring to Fig. 1, an exemplary embodiment of an apparatus 110 is depicted schematically using a block diagram. The apparatus 110 illustrates exemplary hardware that may be configured to comprise a stabilization mechanism according to the invention. The apparatus or device 110 according to a preferred embodiment comprises an imaging component, such as a camera 122. Although, in regard to Figs. 1 and 2 reference is made to the camera 122, and the camera 122 is referred to herein, preferably, the reference to the camera 122 may represent one or more cameras. According to a preferred embodiment, the camera 122 preferably comprises an image sensor for capturing images. The image sensor has a field comprising an area of the sensor, which may be made up of pixels. The pixels define spatial coordinates of the image sensor field. Although not shown in Figs. 1, 2 and 5, the camera 122 preferably includes a capture component or objective, such as a lens (see e.g., Fig. 6), for capturing an image of a scene and directing it onto the image sensor. The camera 122 may also include other camera circuitry or hardware, such as mirrors, reflectors a power source. According to preferred embodiments, the camera 122 may be provided as part of the device 110 and may be arranged to utilize a power supply of the device 110 as well as the other device components {e.g., storage component, processor and the like). The camera 122 further includes circuitry to provide information captured to the CPU 111 for processing and storage. According to preferred embodiments, sensors and the camera (including the camera image sensor), as well as the software and processing components are integrated as part of the device 110. The device 110 also may be referred to as a camera.

[0037] According to preferred embodiments, communications hardware, such as, for example, a radio 118 (or other transmission components) also may be provided as part of the device 110 or in association therewith. The radio 118 may comprise components for receiving and

communicating transmissions, the components comprising one or more transceivers, antennas, and a processing component (which in some embodiments may be shared with the device CPU 111).

[0038] In accordance with the exemplary device 110 depicted in Fig. 1, circuitry and

components are provided to regulate operations of the device 110. As illustrated in Fig. 1, the diagram is CPU-centric, and in the exemplary embodiment, is organized around a central processing unit (CPU) 111. The CPU 111 carries out processing functions based on instructions from software, firmware or other stored or communicated commands, and controls and manages the system to capture, process and generate an output of stabilized video. The CPU 111 preferably performs or directs data-processing computations. The circuitry also may include one or more additional processing components, processors, co-processors, programmable logic controllers, and other separate or integrated components. As illustrated in Fig. 1, a plurality of data-creating sources are provided for ascertaining data. The data-creating sources are illustrated comprising one or more cameras 122, and preferably a set of cameras with accompanying microphones 121, and include at least one Inertial Measurement Unit (IMU) 116 per

independently moving camera 122. For example, where a camera 122 is configured with more than one lens or capturing objective in the same unit, the unit may have a single IMU 116, where, on the other hand, where a lens or capturing objective is independently movable relative to another lens or capturing objective, an IMU 116 preferably is associated with each lens or capturing objective. The IMU 116 is configured to provide suitable information to ascertain movements, including positions and orientation, of an associated camera device (which preferably carries the IMU 116, so IMU sensed motion may be or designated to be the camera motion). The orientation and motion of the camera 122 may be derived from the

orientation/motion of the sensor based on the knowledge of the physical relationship of the camera and IMU, or IMU components (for rigid relative placement). According to some preferred embodiments, the derivation may be accomplished by implementing a calculation based on the IMU motion and physical relationship of the camera and IMU (assuming rigid relative placement). The IMU 116 provides a plurality of degrees of freedom, and, in the preferred embodiments, preferably up to at least six degrees of freedom (DOF), providing a plurality of axes along which acceleration may be measured, and about which rotation may be measured. The IMU 116 is configured to provide measurements of rotation and acceleration so as to provide data from which the camera location and position/orientation may be ascertained. For example, the camera location and position/orientation for each camera 122 of the plurality of cameras 122 of the system can be determined, and preferably determined at or for a given time. The IMU information may provide the position/orientation of the camera 122 relative to a previous camera position, and preferably provides this position/orientation information corresponding with a particular point in time. According to a preferred embodiment, each IMU 116 may include, at a minimum, a set of three mutually-orthogonal accelerometers plus a set of three mutually-orthogonal gyroscopes providing measurements of acceleration and rotation with sufficient frequency and accuracy to isolate the position and orientation of every camera.

According to some embodiments, the motion and position/orientation determining circuitry may include one or more additional components for facilitating accuracy. For example, improved accuracy of the position/orientation and location information may be obtained with the addition of one or more magnetometers, and preferably three mutually-orthogonal magnetometers or, according to some other embodiments, with the inclusion of redundant instruments, or both. Referring again to Fig. 1, the block diagram illustrates the CPU 111, along with a clock 112, memory 113, a GPU 114, and an interface 115 for other input/output components. The device components are arranged in a suitable circuitry. Other input/output components may, for example, comprise displays, touch screens, keyboards or other input devices, switches, USB, SDIO, sensors and the like. The CPU 111 preferably is configured to collect raw image and motion data from all of its sources. Software with instructions for ascertaining and processing information from the sensors and other components of the device 110 or components associated or linked with the device 110 is provided for instructing a processing component to carry out processes on the data. According to the embodiment illustrated, a processing component is shown comprising a CPU 111, which may be controlled with the software, and, which according to some embodiments, may include an operating system software (which may be part of or separate from the software for instructing the CPU 111 on the processing of data from the device sensors and components). The device 110 is configured to generate one or more synthesized camera outputs. According to the embodiment illustrated, the CPU 111 preferably is instructed to perform the necessary manipulations of the information ascertained by the device 110 and its components. The manipulations may include calculations to derive one or more synthesized camera outputs. In order to facilitate processing of the information and generation of the outputs one or more additional components, such as an additional processing component may be provided. For example, since the processing of graphic information and sensor provided data, as well as the synthesis generation may be computationally intensive, a suitable amount of memory may be provided, as well as a GPU (Graphics Processing Unit) 114. The additional processing component, such as, for example, the GPU 114 shown in Fig. 1, may be any of those commonly used to accelerate video and image processing calculations.

[0039] The device 110 preferably includes a reference designation component that enables data to be associated based on a reference point. For example, the information provided by the sensors, including the IMU, and position data preferably is coordinated to correspond with a point in time at which the information was obtained. According to the exemplary embodiment illustrated in Fig. 1, the reference designation component is shown comprising a clock 112. The clock 112 is used to provide a common reference timebase against which to synchronize all other incoming data. The reference based on the reference timebase provides knowledge of the orientation of the camera 122 (or device 110) at each point in its video stream so that data from different sensors or image sources can be meaningfully and coherently combined. For example, in embodiments where a plurality of cameras 122 are employed by the system, the orientations of each camera 122 may be referenced to a particular time point, so that the respective images (and other data) from the respective plurality of cameras (and sensors) may be coordinated together as they were oriented at the time of the image. This stored information may be processed at any time, so as to be streamed from the device.

[0040] Preferred embodiments of the device 110 preferably provide a nonvolatile storage component 125 for storing information, which, for example, may include images/video, position and motion data as well as associated time stamping. The device 110 may store raw data as well as the synthesized or processed data. For example, the device 110 may be configured so that raw data products are continuously archived to nonvolatile storage. The nonvolatile storage component also may hold the stored program or programs used to direct the CPU 111 and GPU 114. Final products, such as, for example, synthesized data or images, may also be archived. However, where the device 110 is configured to store the raw information used to provide the final image or data product, it is not necessary to also store the final or processed products since all raw information required to recreate them is already stored. According to some embodiments, the device 110 is configured to store the raw data and transmit a processed stream of motion stabilized video (in-real-time) generated from the raw data. The device 110 preferably includes one or more volatile memory components 113, which may comprise memory into which information may be loaded to facilitate processing.

[0041] The device 110 preferably is configured with one or more communications components for facilitating transmission of information from the device 110 to a remote component or location, as well as for receiving transmissions (such as, for example, instructions or operating commands). According to preferred embodiments, the communications component may comprise a radio interface 118. The radio interface 118 may be used to transmit products - or raw data- in real-time or on demand. The raw data preferably may comprise compressed data that has been compressed with a suitable compression algorithm to facilitate transmission. The radio interface 118 also serves as a reception point for remote commands and requests. For example, the device 110 may be managed or controlled by receiving commands communicated to the device 110 through the radio interface 118. The radio interface 118 may be separate or integrated into the device 110, and may include one or more antennas, transceivers and processing or controllers for processing or managing communications and transmissions to or from the device 110. The device 110 also may be configured with one or more additional connections or ports, for connecting one or more additional components to the device. This may be done through a wired or wireless interface. For example, the device 110 when used in connection with some applications may include additional input/output devices 1 15a as indicated on the diagram in Fig. 1.

[0042] According to preferred embodiments, the device 110 may be configured to provide the data from the sensors and other associated components. In accordance with a preferred embodiment, the device 110 is shown configured as a camera and preferably has an associated IMU 1 16. The device 1 10 illustrated also preferably includes a microphone 121. Referring to Fig. 2, a schematic flow diagram is provided illustrating a preferred embodiment of a system and devices for generating motion stabilized video. The flow diagrams and schematic illustrations in the figures represent an exemplary embodiment of imaging and sensor components and processes implemented in conjunction with hardware processing components. The device 110 is illustrated by reference schematically to device components (noting for convenience that only some of the device components shown in Fig. 1 are represented in Fig. 2). Referring to Fig. 2, the flow of data from sensors, such as, for example, the IMU 1 16, camera 122, microphone 121 , and other potential sensor components that may be provided or associated with the device 1 10 (e.g. , through the I/O, see 1 15 in Fig. 1), to final output is depicted in accordance with a preferred implementation of the invention. In Fig. 2, a system is represented wherein a plurality of image capturing devices 1 10 are represented, each having a respective IMU 1 16, camera 122 and microphone 121. The sensors are components that provide inputs, which include information that is obtained based on the occurrence of events or presence or absence of conditions.

[0043] Rectangles in Fig. 2 denote processes that operate on this data as it propagates through the diagram in the direction of the arrows. In the illustration depicted, solid rectangular boundaries denote processes (things that take inputs, do something to them to produce outputs). Other shapes preferably are used to represent "objects" like sensors, cameras, files. The dashed boundaries are rectangular, but intended to denote both grouping and multiplicity according to some embodiments.

[0044] Beginning at the top of Fig. 2, raw data from a number of sensors is transformed into fused data. A sensor fusion process 210 is carried out to reconcile the data sources against a common reference timebase 218. Preferably, data provided by each sensor, such as for example, the camera (e.g., image sensor) 122, IMU 1 16, microphone 121 (and other components on a device 1 10), is associated with a particular time, so that the data from each source has a common associated time. The data from the contributing sources, including data ascertained from the device sensors and associated components, preferably may be stored 211 on a device storage component 125 in connection with its associated time data or time stamp. According to preferred embodiments, a series of corrections or calibrations are applied to the images from each camera 122. Both intrinsic and extrinsic image processing operations are considered, generally accounting for camera-related and motion-related idiosyncrasies/imperfections, respectively. For example, the device 110 is configured to implement one or more corrective processing steps that may be carried out, which may comprise one or more intrinsic image corrections 212 and extrinsic image corrections 213. The correction steps 212, 213 provide a calibrated video stream from a camera 122 (or a plurality of calibrated video streams from a respective plurality of cameras 122). Given a set of calibrated video streams from one or more cameras 122, viewpoint synthesis 214 is implemented to create one or more new video streams that gives the appearance of having been recorded by a virtual or synthetic camera with a specific time-dependent location and orientation that does not correspond to any physically available camera. Cropping 217 may also be implemented to ensure the absence of any unexposed portion of the frame, or

alternatively, to minimize any unexposed frame portion (which may otherwise appear as an absence of an image in a portion of the image frame). According to preferred embodiments, synthesis 214 is driven by a focus of attention (FOA) process 215 that selects the desired virtual vantage point based on selection elements 216, which may comprise one or more external requests and controls, actual camera motion, analysis of raw video, or algorithmic processing. Synthesized video products 220 may be communicated, transmitted and/or processed further. For example, synthesized video products 220 can be transmitted 221 with a suitable transmission component (e.g. , the device radio 118), displayed on a suitable display component, such as a display screen, subjected to further processing (e.g. , application specific processing 223), or simply stored 224 on a storage component, or by any combination of these. The synthesized video products 220 preferably comprise motion stabilized video which is generated from the one or more streams of the synthesized virtual cameras.

[0045] The present system, method and devices, such as the device 1 10 depicted in Figs. 1 and 2, preferably implement sensor fusion 210 to coordinate the information from a plurality of sensors. According to preferred embodiments, the reference platform contains a number of sensors each producing its own output data stream. For illustration purposes, according to an exemplary embodiment, the device 1 10 is configured to produce video images that are stabilized images (including a stabilized video stream). The device 1 10 is shown in Fig. 1 comprising a plurality of sensor components, such as, for example, an IMU 116, camera 122 (with an image sensor), microphone 121, as well as other possible components provided with or linked for association therewith (e.g., components supported by the I/O 115). According to a preferred embodiment, the system may include a plurality of devices 110 (see, e.g., Fig. 2). Although it would be convenient— hence preferred - for these sensors of a single device (or of a plurality of devices) to be tightly integrated such that they all emit synchronized payloads at regular intervals, this may not generally be possible. According to some embodiments, sensor instruments, including the sensor components shown and described, may be loosely-coupled or even independent, and may operate at different and potentially time- varying rates (e.g., where a motion sensor ascertains information at 100 times per second and a camera outputs a frame at 30 frames per second). Some of the sensor components may even possess distinct and unsynchronized local clocks (or "timebases"). The device 110 preferably is configured to manage the sensor data by

implementing sensor fusion 210. Sensor fusion 210 manipulates the information to reconcile the data sources against a common reference timebase 218 in order to produce "fused data" from raw data. The system, method, and devices preferably are configured to generate fusion of data. According to a preferred embodiment, creating fused data does not modify any of the raw data, leaving the raw data available for further use (e.g., processing or manipulation). Preferred embodiments create fused data to augment and extend raw content with sufficient cross-reference metadata to co-register (align) the raw data types using the reference timebase 218 and its known (or estimated) relationships to the separate sensor source timebases. For example, each sensor may have associated with it a sensor time base. For example, the camera 122 image sensor may have a camera image sensor timebase 218a, while the microphone 212 has a microphone timebase 218b, and while the IMU has a timebase 218c. Sensor fusion 210 preferably may produce new sensor products or data components. Sensor fusion 210 may provide more than "synchronization" 209 (see Fig. 5) because new sensor products may also be derived at this stage, i.e., the whole is potentially greater than the simple sum of its constituent parts. According to preferred embodiments, a plurality of sensors are configured to provide sensor data. In an exemplary embodiment illustrated in Figs. 1 and 2, three sources of sensor data are identified: IMUs 116, cameras 122, and microphones 121. The camera 122 and microphone 121 are well- known electronic sensors recognized by nontechnical end-users, and do not require further explanation. Common practice bundles the sensors together, yielding a bonded audio/video data stream that is already fused. This is such a ubiquitous configuration that, in the sequel, the term "video" always implicitly allows for the optional presence of synchronized audio, whether such synchronization is effected at the sensor assemblies or by the sensor fusion 210 process.

[0046] The IMU 116 preferably is provided as a single component, or may be provided as a plurality of components, and, according to some embodiments, the components may be associated together in a circuit, or in conjunction with a microcontroller or microprocessing unit. According to preferred embodiments, the IMU 116 may be configured to provide three types of data used to derive accurate relative and absolute estimates of position and orientation, which include angular rotation rates measured by gyroscopes, acceleration measured by accelerometers, and magnetic field measured by magnetometers. The IMU 116 may be configured to provide the position and orientation data and may, for example, comprise a monolithic device, an integrated circuit or assembly, or aggregation of disparate sensors. The components comprising the IMU sensor, for example disparate components, or integrated or associated components, may provide data that is separate and will be separately timestamped, or may provide already fused together data. Depending on whether the IMU is a monolithic device, an integrated assembly (e.g., black- box or grey-box assembly), or an aggregation of disparate sensors, the data streams from distinct IMU sensing modalities may or may not already be fused with each other. Embodiments of the devices according to the invention may include one or more additional components to further facilitate movement determinations. For example, additional instruments may be present to aid in the position, orientation, or navigation tasks, including multiple sets of the three

aforementioned IMU instruments. Sensor fusion 210 of the system and device accommodates these situations.

[0047] According to preferred embodiments, raw data ascertained from the sensors, including, for example, position and orientation data from the IMU 1 16 (and/or its components), the microphone 121 , and image sensor of the camera 122 is stored in its unmodified form on a storage component, such as the device storage component 125. Preferred embodiments provide a database which stores sensor data in accordance with a timestamp. The raw data may include the sensor timebase for each sensor. The storage may be provided on each device 110, and in the case of a plurality of devices 110 operating, there will be timestamped data from the sensors of the respective plurality of devices 1 10. Furthermore, according to some alternate embodiments, where the camera 122 is configured with one or more lenses and captures images from different directions, in some cases, where the lens components are independently movable as to position and orientation relative to each other, the device 1 10 may provide a respective plurality of data for each camera lens component. Unmodified raw data along with the additional metadata utilized to achieve sensor fusion are always stored so that a database of unadulterated source materials remains available for future review. The resulting archive supports reprocessing with different control settings, enhanced exploratory or experimental offline processing, and complex workflows involving the fusion of external image sources or other independently-acquired data.

[0048] While the diagram illustrated in Fig. 2 depicts storage 211 occurring after the fusion 210 of the data sources, it may be advantageous to implement both processes in a distributed fashion. The presence of a lone "storage" block 215 in Fig. 2 is exemplary and does not limit the form of the archive to a single file or database. Raw data may be archived separately and immediately as it emerges from individual sensors, with fusion-related metadata written to one or more physically separate archives. The data may be archived in a database or other form.

[0049] As further depicted in Fig. 2, the image data preferably is processed in an intrinsic image correction step 212. The image may be manipulated at this step to compensate for certain properties that the camera may have imparted to the image. The image data is manipulated to convert or adjust the image. The system and devices are configured to implement intrinsic image corrections 212. Intrinsic image corrections 212 represent manipulations to the image based on camera properties. According to preferred embodiments, the manipulations preferably comprise computations upon and changes to an image in order to compensate distortions or imperfections that depend only on the physical characteristics and geometry of the camera, including the lens and sensor. Some examples of the manipulations may include correction of distortion from lens irregularities or skewed projections attributable to manufacturing defects, compensation for anisotropic (direction-dependent) sensitivity to lighting, or removal of wide-angle lens phenomena such as fisheye warping or barrel distortion. In this processing step, preferably, camera-specific calibration measurements are utilized to perform these calculations, with no other sensors being involved - as the name indicates, the computations are based entirely on intrinsic properties of the camera. According to preferred embodiments, software containing instructions to instruct the processor to manipulate the data and provide the changes is provided. The software may be integrated on a chip on the device 110, or be provided on a device storage component. The processing component, such as the CPU 111, preferably is instructed to implement the manipulations to provide an intrinsic image correction, and corresponding intrinsically corrected image (ICI). The ICI image (or IC image data) may be stored, or further processed, or both. In this intrinsic image correction manipulation step 212, specifically excluded are effects related to the motion of the camera or subject. Thus, according to preferred embodiments, manipulations for intrinsic image corrections 212, which may comprise computations, operate upon raw video data without regard for the timestamps accompanying the frames, so processing can be applied prior to sensor fusion. Nevertheless, it is convenient for didactic purposes to portray the intrinsic image corrections 212 occurring after sensor fusion 210 in the diagram of Fig. 2. (Alternately, intrinsic corrections may be provided as part of the fusion process 210 itself, wherein raw images are "fused" with camera-specific calibration information to derive a new and more refined video product).

[0050] The image preferably is processed to undergo further manipulation in one or more extrinsic image correction steps 213. Extrinsic image corrections 213 comprise manipulation of the image which include operations upon an image to compensate for defects that are not inherent in the camera or sensor physical configurations, but that arise as a result of external circumstances. Some examples include rolling shutter artifact mitigation and motion blur compensation. For example, in the case of a rolling shutter, the image is not exposed

simultaneously, rather the image is exposed at a different place for each scan line, and therefore, at a slightly different time. Applying these corrections requires external (i.e., extrinsic) information (e.g., knowledge of camera motion) beyond that contained within the video stream and hence must be performed after fusion 210 to ensure proper synchronization among all data sources. Furthermore, examination of image data from more than a single video frame may be required to achieve the desired results. According to some preferred embodiments, the extrinsic image corrections 213 may be applied in conjunction with the viewpoint synthesis 214. As discussed herein it may be preferred that some - or all - extrinsic image corrections may be combined with the viewpoint synthesis 214 instead of being applied separately.

[0051] As depicted in Fig. 2, the image is further processed in the viewpoint synthesis 214 and cropping 217 steps. The stabilization mechanism preferably implements viewpoint synthesis 214. The system and devices shown and described herein are configured to carry out viewpoint synthesis. Viewpoint synthesis 214 lies at the heart of the image processing chain depicted in Figure 2. The viewpoint synthesis is intimately associated with cropping 217. The depiction of multiple dashed boxes surrounding the viewpoint synthesis 214 and cropping 217 operations in Fig. 2 denote the possibility of having more than one set of simultaneous parallel operations to create independent outputs. The viewpoint synthesis 214 preferably comprises a reprojection manipulation of the image. Preferably, the reprojection of the viewpoint synthesis 214 is carried out on the image to optimize the processing by combining it where appropriate to facilitate processing. Reprojection may be carried out after manipulation by extrinsic and intrinsic adjustments, in steps 212, 213, that have taken place on the sensor fusion processed image, or, alternatively, may be carried out by combining it with the extrinsic and/or intrinsic adjustments 212,213. Preferred embodiments of the devices, such as the imaging device 110, systems and method are configured to implement manipulation of a captured image by reprojection and viewpoint synthesis.

[0052] Referring to Figs. 3 A, 3B and 4A, 4B, an illustration is provided in connection with reprojection and viewpoint synthesis. Figs. 3A and 3B illustrate an exemplary camera configuration. The devices, systems and methods preferably carry out a reprojection process. Fig. 3 A is a schematic illustrating camera and subject geometry for imaging a simple scene using an idealized "pinhole" camera. Fig. 3B is a schematic illustration representing the resulting image. The schematic illustrations of Figs. 3 A and 3B present an orthographic projection viewed from above the camera, with plane of the schematic oriented at right angles to both the focal plane and the vertical axis of the camera. For the purposes of this illustration and exemplary embodiment, this symmetric arrangement allows the third dimension extending upward and downward from the page to be ignored. In the pictured scenario, a camera with infinitesimal aperture at point O is aimed directly at point A. Thus, OA forms the optical axis of the camera, and the projected image of point A will appear directly in the center of the sensor (i. e. , the camera image sensor). The camera focal plane captures the image of its scene at some projected distance OX (the focal length) from the aperture; although this plane is physically located behind the aperture, it is convenient (and fairly conventional) to draw it in front of the lens as a "virtual focal plane" for clarity and ease of understanding. Because the physical extent of the sensor is bounded by points W and Z, the field of view (FOV) of the camera is limited to the area contained within angle BOC. The only object visible in this posited exemplary scene is a sphere 300 located at the extreme right edge of the FOV, with its center in the plane of the figure. Its size and distance are unimportant, but the sphere 300 subtends angle DOC as viewed from the camera - hence its projection upon the focal plane along the horizontal axis of the camera is the line segment YZ. As illustrated in Fig. 3B, what is represented is the view seen by the camera, as captured on its sensor (i.e., image sensor).

[0053] Since this sphere 300 represents an object of interest, in retrospect it would have been desirable to have captured its image while it was closer to the center of the sensor (the sensor field boundaries referenced as W and Z), more aligned with the optical axis OA that intuitively forms the "center of attention" of the camera. Figs. 4 A and 4B demonstrate how this geometry could have been obtained from the existing configuration. As shown in Figs. 4A and 4B, most of the original lines and notations from Figs. 3A and 3B are retained for reference, with additional information being introduced (and having prime designations in the reference characters).

Without changing the locations of the aperture or sphere 300, if the camera is aimed along a new optical axis OA' then the center of the sphere 300 projects to the center of the sensor (i.e., through X'). The resulting image is shown in Fig. 4B (ignoring for the nonce the dark shading to the right of the image of the sphere). The resulting image is not identical to shifting the image from Fig. 3B leftward because the image plane W'Z' is tilted with respect to the original WZ - scene objects (if there were any present) to the left of the sphere 300 (as viewed from the aperture) will lie closer to the new image plane while those to the right will be farther away. Although production of the new image shown in Fig. 4B required a physical change in imaging geometry, under certain restrictions it is possible to derive the relationships between pixels in the two images based only on a fundamental principal of geometric optics (viz. light travels in a straight line from the subject directly to point O) and knowledge of the orientation angles involved. Hence, a change of optical axis can be realized virtually, without the need for physical camera movement or modification of geometry.

[0054] When the camera undergoes movement, such as, for example, a change in its position, or orientation, the FOV is changed. Physically changing the orientation of the camera (as described in the example situation immediately above) results in an FOV bounded by the new angle B'OC, but reprojection remains limited to using imaging data captured from within the initial bounds of BOC. Effectively, the FOV of the synthesized camera is reduced to B'OC. Referring to Fig. 4B, the black shading reminds that it is not possible to reproject images from regions of space that were not represented in the source image with its original optical axis. In the case of

reprojection, there is only one physical camera but the reprojected image effectively represents a second.

[0055] The system, method and devices preferably are configured with instructions for manipulating the image data to adjust the optical axis from a first optical axis designation to a second optical axis designation. Subsequent optical axis adjustments may be made in accordance with image data. The image data preferably includes fused sensor data, including the pixel image information from the camera image sensor, other sensor data, such as, for example, microphone data, and IMU data (e.g. , orientation and position data), as well as the timebase at which the data was obtained.

[0056] An optical axis manipulation provides a reprojection of the image, such as, for example, as illustrated in connection with the object or sphere 300, and adjusts the image plane.

Reprojection represents a component of viewpoint synthesis 214 (see Fig. 2). In accordance with the manipulation, a first new camera is synthesized to correspond with the reprojection and provides a new point of view (compare Figs. 3A,3B with Figs. 4A,4B). The reprojection comprises a component of the viewpoint synthesis.

[0057] The method, system and devices implement further manipulation of the image

information to provide one or more alternative viewpoints by implementing further components. At each moment in time, each camera clearly possesses a well-defined viewpoint. This is the first viewpoint VP1. A plurality of alternate viewpoints (VP2, ... VPn) may be generated through processing of the image data information of or corresponding with the first viewpoint VP1. A processing step of applying an adjustment or correction may be applied to viewpoint image information, and the adjustment may be based on an application of a mathematical formula applied to the image data. The image adjustments preferably utilize the IMU and other sensor data. The system, method and device preferably are configured to generate an image (frame of video or portion of a frame) corresponding with and having a virtual viewpoint VP (e.g., VP1, VP2, . . VPn). The images generated may be done in succession or continuously as the camera undergoes movement, or during the time that the camera is capturing a scene, i. e. , recording video, where the camera may be in an intended position for some image capture and in a changed position for some other of the image capture. The manipulations of the video generated from the image capture information preferably may be carried out recursively, from the IMU

position/orientation information. According to some embodiments, the image information obtained from the information of the physical camera image capture may be manipulated by mapping to preserve some points, straight lines and planes, but not others. For example, with a designated viewpoint, which may become a designated virtual viewpoint (where the camera has been moved from its original physical position providing the designated viewpoint), ratios of distances between points on a straight line may be preserved (from physical camera original viewpoint versus a virtual viewpoint). The manipulations may permit angles between lines to change as well as distances between points in the virtual viewpoint synthesized image (such as a frame or frame portion). The manipulation of the image preferably is carried out so that some of the parallel lines may remain parallel. According to some preferred embodiments, the manipulations may comprise linear transformations (such as "affine transformations") and generation of a viewpoint and its associated field of view may be produced from applying geometrical optics principles to the image data to produce a manipulated image data for the data set image (obtained by the physical camera) but corresponding with the designated viewpoint (VP). The manipulations generate an image that represents the field of view (which may be a synthetic field of view) from the designated viewpoint (VP). The manipulations according to a preferred embodiment comprise reassignment of the image data from the previous or reference focus of attention viewpoint (VP1) to a different or synthesized viewpoint (VP2). An angular component of the image data is applied to change the angle for the field of view. The angular component manipulation may change the angle with respect to the path between the scene, scene object (or image target) and the synthesized field of view (FOV) of the synthesized viewpoint (VP2), the synthesized corresponding field of view referred to as (FOVyp 2 )- The angular manipulation depends on the location in the field of view, as for example, in Fig. 4A, the assignment of the shifted field of view (FOV) represented by W'Y'Z'. For this discussion we can consider that the camera has moved from the intended or designated field of view in Fig. 3 A, on the look point OA, to the position in Fig. 4A. In the exemplary depiction of Fig. 4A, an angular shift is positive in regard to the synthesized FOV for image portions to the left of the plane intersection (that is left of where W'Z' intersects W Z), and negative in regard to the synthesized FOV for image portions to the right of the plane intersection (that is right of where W'Z' intersects W Z). The image in the first field of view (FOV), such as that field represented by W Z in Fig. 3 A, preferably is captured by the image sensor pixels. The pixels represent spatial coordinates of the image field.

[0058] For example, an alternate viewpoint (VP A) may be considered to be where the camera has moved from its position/orientation in Fig. 3A, and alternate corresponding field of view, such as field of view (FOVVPA), is associated with the alternate viewpoint VPA (for example, shown by the different viewpoint in Fig. 4A where FOV W Z' is represented). The device is configured to manipulate the captured data associated with the pixels to generate the image from the synthesized camera having a synthesized viewpoint (SVPl). The synthesized viewpoint may be a synthesized viewpoint corresponding to an initial viewpoint (such as VPl), so that the image is synthesized as if the camera were still in the position depicted in Fig. 3 A. Conversely, the image of the sphere 300 in Fig. 3A and field of view represented in Fig. 3 A may be synthesized from another viewpoint (e.g. , such as, for example, to be imaged as a virtual synthesized viewpoint and field of view from the camera position depicted in Fig. 4A, if that were to be desirable). Other synthesized cameras and viewpoints may be produced. The discussion refers to image, which preferably may be a video stream. The image information, including camera movement, may be processed rapidly so that the viewpoint and field of view manipulations are made rapidly and the video stream may be produced with the viewpoint synthesis manipulations applied.

[0059] The processing manipulations of the image data are done rapidly in response to and in coordination with the IMU movement data. A video stream is produced. Preferably the video stream is produced and may depict the scene from the point of view of the designated viewpoint, even though the image frame (or image information) was captured at a moved position of the camera, which may be an alternate position from which imaging is done from an alternate viewpoint (VP A). The data captured from the alternate viewpoint (VP A) may be synthesized to have a synthetic look direction, as if imaged from the initial or designated viewpoint (e.g. , VPl). The synthetic cameras provide frames or frame portions so that the viewpoint from the moved position or changed orientation (VP A) of the camera may be used to generate a synthetic camera (or plurality thereof) that captures the image from the initial viewpoint (VPl), even though the physical camera FOA has moved from that initial viewpoint (VPl). According to some alternate configurations, a synthetic camera may generate video images as if produced from a camera that is imaging the scene (e.g. , subject or target object) from an alternate position (or a number of alternate positions), which are alternate to the camera look point. That is, even though the camera look point or FOA is imaging in one direction, the scene may be viewed from one or more other directions (i.e. , as if captured from one or more directions). According to preferred embodiments, video data preferably is obtained from a first or initial direction (VPl), and may be manipulated to generate video of the scene that corresponds with an alternate or second viewpoint (e.g. , VP2). The video generated from the alternate or second point of view (VP2), preferably is generated by manipulating the video information obtained for the scene from imaging in the direction VPl . Adjustments are made to the pixel data, such as for example, an angular adjustment to provide an angle corresponding to and angle that the FOV has changed, and a relationship adjustment for image pixels along parallel lines. The adjustments may be implemented as the video is being captured with the camera so that adjustments to the image data provide a look direction that is smoothed even though the camera may be undergoing multiple position/orientation changes. The image data preferably is manipulated rapidly to provide a stream of adjusted video.

[0060] The adjustments or manipulations to the image data to provide a selected or designated look direction preferably also do so while the camera is undergoing desired motion, such as translational motion. According to preferred embodiments, the device and system is configured to discern desired movements from the undesired camera motion that requires adjustment.

According to a preferred embodiment, the desired movement preferably is determined by monitoring and evaluating the motion of the camera, and preferably, the continued motion of the camera. The device and system are configured to evaluate the motion and time information and determine whether the motion is a deliberate motion that is desired or acceptable motion (that is not corrected) or whether the motion is undesired motion.

[0061] The device and system preferably are configured to ascertain movements of the camera (which provides a position of where the lens is pointing), and evaluate the times at which the movements occur. The device and system preferably are configured to distinguish between a first type of camera movement, which may be translational or intentional movement, and a second type of camera movement, which may be oscillating or rotational movement (such as a change in orientation of the camera). One example is where an individual is wearing a camera which is configured as a body camera. For purposes of illustrating an embodiment of the device and system, the look direction is selected or designated to be a location in front of the camera (although it could be designated to be another designated direction, preferably within the field of view of the camera). Activity by the individual typically will result in movement of the camera. The camera may experience movements as a result of the individual walking, running, ascending or descending stairs, driving in a vehicle, or other movement. Where an individual is moving forward, e.g. , walking, running, traveling in a vehicle, there is a motion type that is typically translational motion. The motion movement, for example, of a forward moving individual, is preferably evaluated and identified by the device and system as the aforementioned first type of movement. The device camera movement information preferably is obtained by the sensors, e.g., the motion sensors such as one or more IMU's, and the timestamp identifies the movement as a function of a time interval.

[0062] The time motion data is obtained, and the movement pattern is evaluated to ascertain whether within the movement pattern, there is a threshold degree of randomness. For example, a first randomness threshold (i.e., low randomness or relative low randomness) may be established for movement that is determined to be designated as intended movement. Intended movement of the camera is where the camera movement is desired, so if the camera is moving forward, the field of view remains in front of the camera, and if the individual and camera worn by that person were to make a turn, such as, at a corner of a street, that movement would be intended movement. The focus of attention (FOA) of the camera may point toward a particular direction or at a desired object or subject to be followed by the camera (e.g., a target). The device and system preferably are configured so that movement information that identifies a first type of designated movement, such as directional movement, does not implement an image adjustment for that movement. According to some preferred embodiments, the movement data is considered in connection with a time frame, and movement changes in short time durations may be designated to be movements for which adjustment or manipulation of the image is made {e.g., to reproject the viewpoint from what the camera actually views at the time of movement). Conversely, movement changes in longer time durations may be designated to be intended movement for which no adjustment or manipulation of the image viewpoint or look direction is made to compensate for the longer duration intended movement. However, even though an adjustment or manipulation is not made to the image in connection with the type of movement that is evaluated to be intended movement, small, random, undesired movements that are detected preferably are evaluated and an image correction is applied. In the current example, where an individual is moving in a direction and turning a corner, and the individual is walking, some movements of the camera are desirable and other movements are not. The directional movement of the individual walking following an intended path of motion is determined, and the system and device preferably identify the directional movement as desired movement, and the image is not adjusted for the camera being moved toward where the individual is walking. However, while the individual is walking, in addition to the intended movement taking place, undesired camera movement, such as shaking or rotating, may be ascertained by the device. The system and device identify the undesired camera movement and apply a manipulation to the image (even though the image is recording a scene where the camera is moving toward something, e.g., walking in a direction). According to preferred embodiments, the method is carried out to ascertain movement of the camera, which may be desired movement and undesired movement, which may happen at the same time or at different times. . Walking may result in movement of the camera in a left or right direction, or upward or downward direction, as the individual's steps taken may jostle the camera. These movements are detected by the device and system components and the movement information is processed and preferably evaluated to be identified as undesirable movement. As a result of this second type of movement being detected, the camera look direction, which is in a non-designated direction (based on the undesirable movement), is manipulated for that time of movement to have a look direction that represents the designated look direction (which the camera does not have). Conversely, at the same time as the camera is undergoing this second type of movement, it also may be undergoing the first type of movement. A synthetic camera generated by the device and system provides a look direction that is or substantially is close to the designated look direction. The target (such as an object or subject being followed) or focus of attention therefore may be maintained, or attempted to be maintained. Preferably, the field of view of the camera may overlap with the field of view of the synthetic camera or synthesized viewpoint. According to some embodiments, the camera is configured to image a wide field of view to facilitate increasing the likelihood of capture of the designated or desired viewpoint within the field. According to some preferred embodiments, the image sensor field may be larger than the smaller synthesized output image to allow for a more expansive (or wider) area to provide more field for the synthetic cameras and the corresponding viewpoints they may have. The synthetic camera output image may cover a portion of the image sensor field, and, according to preferred embodiments, the portion which the synthesized image output covers may be any location within the image sensor field, and may change so that the synthesized image output is from different portions of the image sensor field. Where the individual wearing the camera is running, the individual changes position, in the direction of movement, and while that change in movement takes place, the camera also moves abruptly (even when harnessed). The abrupt movements may otherwise produce unusable video images, if the actual camera look direction were used for each frame. The device and system implement manipulation of the image information to produce a video stream with video image frames that maintain a look direction through the camera and one or more, and preferably a plurality, of synthesized cameras synthesized from the one camera through which images are being recorded. According to some embodiments, the synthetic camera viewpoint image frames are injected into the video stream when the physical camera look direction has moved from the desired or designated look direction, and, preferably as a result of unintended camera movements.

[0063] According to some embodiments image adjustment may be made by redesignating a viewpoint or look direction. For example, although the camera may initially point in a designated look direction, another direction form which an image is captured, including a direction from a synthetic camera, may be designated as a look direction or viewpoint.

[0064] The system and device preferably may generate a plurality of synthetic cameras from a physical camera imaging the scene. Accordingly, a large ensemble of viable alternate viewpoints may be generated. The system and device are configured to generate viable alternative viewpoints. Although the depictions shown and discussed have been provided in conjunction with individual images, the system, method and devices capture information that is captured as a video stream. The desired video stream preferably is a video stream that is designated or purposed to image a particular object or target. The designation may be along a particular path or focus of attention. As discussed herein, reprojection generally reduces the field of view.

According to some embodiments, one or more additional cameras may be used to provide the missing data - possibly after their own images are subjected to reprojection.

[0065] According to preferred embodiments, the image information obtained from the camera and other sensors is utilized to generate one or more new video streams from one or more respectively synthesized virtual cameras. A plurality of synthesized virtual camera streams may be generated. In spite of potentially diverse and time- varying details of intrinsic camera properties and orientation, as long as sufficient FOV coverage of the desired target is available, the device may manipulate the image data so as to fuse data from all cameras and exploit multiple video sources to synthesize one or more new video streams from correspondingly- synthesized virtual cameras. The system, method and device preferably are configured to provide a suitable level of coverage for the camera. For example, where the camera is to be utilized for relatively closely occurring activity, the image field may be suitably configured to capture a field in which the activity is occurring (or likely to occur), for example, by using an appropriate lens, such as, for example a wide field type, or fisheye lens. Conversely, where the activity is distant, the image field may be configured to capture a suitable field within which the activity is occurring (or anticipated to take place). Embodiments of the system, method and device are configured to provide the desired level of coverage for the activity taking place.

[0066] According to some embodiments, a plurality of cameras may be utilized in accordance with the present system to provide a plurality of FOV s, which have corresponding respective viewpoints. In addition, one or more (or all) of the plurality of cameras may provide image data from the camera and other sensors, which is manipulated to provide a respective synthesized virtual camera, and generate one or more new video streams from each of the one or more respectively synthesized virtual cameras. A plurality of synthesized virtual camera streams may be generated from each of the plurality of physical cameras, and the respective image

information provided by each of the respective plurality of physical cameras.

[0067] According to embodiments of the invention, when more than one camera is not implemented (e.g., to image the same object or subject) or is unavailable, the camera preferably is configured with a wide field of view. For example, preferably the native camera sensor FOV may be adjusted to enhance the field using one or more wide angles lenses. These wide angle lenses, such as fisheye lenses, provide more FOV, but have distortion at the edges. Embodiments may employ deliberately severe distortion (such as that found in fisheye lenses) in order to achieve these goals while ameliorating the effective FOV losses.

[0068] According to preferred embodiments, the devices and systems may be configured to maintain a look direction that corresponds with the direction of the moving body or other support on which the camera or device is carried. For example, the device may be moving in a forward direction, and the direction may be processed to correspond with a movement vector. As a further example, a body carried device, such as a mobile police body camera, may be determined from its motion data to be moving in a particular direction, say, for example, a radial path of travel. The motion may be determined to be substantially along a particular path of travel, the radial path, until some other change in direction is sensed. In this example, the camera may be configured to determine a field of view to be a path following that radial direction. In this manner, the field of view may be adjusted slightly to the radial direction of travel, which is the look direction and where the look direction is likely the desired direction of the pursuit or interest area. For example, where the travel direction is along travel vector (TV1) and the camera device is moving (e.g., due to body motion of the person carrying it), the image may be manipulated to provide the field of view of a camera that is moving in the travel path, in this example, along TV1. A synthesized camera may be configured to sense and follow an intended path of travel. The motion stabilization also may be implemented to generate a video stream of image data representing a field of view that manipulates the image or images forming the video stream to a stabilized depiction of the scene. This is one example, and another example may be a path of travel that is substantially linear. In these instances, where the camera may veer off of the perceived intended path, the mechanism of the device (components, sensors, processors, etc.) processes the movement information and preferably obtains the scene image from the expected path direction that is anticipated or perceived to be desired, based on the device configuration. The device preferably is configured with software containing instructions to process the data from the sensors and apply manipulations to the image data to produce stabilized video from one or more synthetic viewpoints.

[0069] In conjunction with manipulating the image or images to form a stabilized video stream, field of view reduction may be implemented. FOV reduction and its variability highlight another difference between individual camera viewpoints (whether synthesized or real) and the synthesized viewpoint of a virtual camera. For any physical camera, FOV is an intrinsic property of the sensor, the lens and their relative placement. For example, the lens may have one or more settings, and the lens may become a different lens when a different setting is applied. One example, is where the lens is not a fixed-focus lens, it effectively becomes a different lens when it is adjusted. Similarly, the FOV of a reprojected image is limited by its source image with potential further reductions being dependent on the transformation geometry. Producing an aesthetically acceptable video stream from any single camera invariably requires rectangular cropping 217 (see Fig. 2) such that worst-case FOV reduction never reveals unexposed (off- camera) portions of the imaged scene. In contrast, output products 220 from viewpoint synthesis 214 ultimately remain limited by raw data sources but still enjoy considerable freedom in choosing the intrinsic characteristics of the synthesized camera including its field of view. As mentioned previously, an embodiment may employ multiple cameras to provide synthetic video output. According to some alternate embodiments, deliberate physical placement of multiple cameras can ensure this degree of flexibility. Of course, despite the freedom to select FOV explicitly, the final output must be rectangular (for the rectangular display format). Hence cropping may still be required, not to discard otherwise-valid image data to avoid exposing gaps in coverage, but rather to limit the extent of the manipulations (including computations) that must be performed. The cropping facilitates lower power utilization (e.g. , decreased processing power/tasks). Thus, although Fig. 2 shows cropping 217 following after viewpoint synthesis 214, preferred implementations may implicitly merge the two processes, absorbing the former into the latter. Furthermore, explicit per-camera (i.e. single-camera) cropping need never be performed - when implemented and driven efficiently from a sufficiently high-level process, viewpoint synthesis 214 may be configured to manipulate the image information so as to only select subsets of image data that will be used in the construction of its final image.

[0070] Even when multiple cameras are fused to produce a new synthesized video stream, all preceding descriptions have characterized reprojection from a single camera as a per-image procedure. When an electronic camera with a rolling-shutter is employed, the process becomes more complicated. Now, each row in the image is exposed at a slightly different instant in time - removal of rolling-shutter artifacts associated with camera motion requires synthesis of a virtual viewpoint for each row in order to achieve the appearance of an image obtained with a global- shutter camera. Synthesis of a global-shutter viewpoint requires access to two consecutive images, and is subject to the same FOV concerns as image-level reprojection. According to preferred embodiments, the synthesis is generated by synthesizing a final desired viewpoint directly. For example, for efficiency, preferably, synthesis of the final desired viewpoint directly is carried out rather than sequentially computing a global-shutter viewpoint followed by a reprojected viewpoint.

[0071] Fig. 2 illustrates a process diagram and denotes a focus of attention (FOA) 215 process. The system and devices, such as the exemplary device 110, are configured with instructions to synthesize one or more virtual cameras from the device information. The device 1 10 is configured with software containing instructions to process the image and sensor data to select a desired viewpoint, which preferably is a desired instantaneous viewpoint for the virtual camera (or viewpoints in the case of a plurality of virtual cameras). The viewpoint selection information generated to represent the viewpoint for a virtual camera preferably is stored on the storage component of the device 110. According to preferred embodiments a database is generated or constructed to hold the image information, including virtual camera information for the one or more virtual cameras that are synthesized. The database, according to some embodiments, preferably includes the virtual camera image generated for the corresponding focus of attention. The images preferably may be a video stream (for example, up to 30 frames per second captured video). The device 110 also preferably is configured to stream the video as live streaming video. The device 110 processes the video images captured from the one or more cameras and one or more synthesized or virtual cameras. Preferably, the images are adjusted to capture a frame image in accordance with the desired selected focus of attention (FOA). The focus of attention (FOA) may be generated by the device 110. The system and device generate a focus of attention (FOA) 215 to select or designate the desired instantaneous viewpoint for one or more virtual cameras, forming the controls that drive viewpoint synthesis. In a sense, FOA and viewpoint are interchangeable terms; the two names distinguish between one process that selects virtual viewpoints and one or more processes that reify those viewpoints via extensive image-processing manipulations, which include computations. Components of the imaging computations involve designating or assigning of a look direction, which may be a focus of attention of the camera or lens. A component of the data profile is IMU data associated with the position and orientation of the camera. The image information captured by the camera and the IMU position/orientation information are designated or registered in time. The camera image information is manipulated to produce an image or image portion that corresponds with the assigned or desired look direction (from which the camera was moved) when the new look direction occurs when the camera moves or changes position or orientation. The manipulations preferably include obtaining the relative IMU movement differential for movement of the camera taking place. Movement information or data is obtained from the IMU sampling, which may be a number of times per second, and the movements may be processed to relate the prior movement. This may be related to the initial or designated camera look direction, and each subsequent movement may be related to the previous movement, so that adjustments of the actual image captured from a viewpoint may be adjusted from a relative position movement or differential. The IMU information preferably is applied to the time point at which the corresponding image information was captured. The IMU movement information may be used to determine an angular adjustment and a position adjustment within the frame of the image sensor. The camera image sensor field captures images from the initial look direction, and then the image sensor field captures images from a direction that the IMU has detected the camera to have been moved to. The angular component adjustment may be applied to adjust the moved field to relate to the initial field by a rotation about an angle (or one or more angles) which may be one of the 6 degrees of freedom or three axis of the IMU. The focal length also provides a parameter that is used to determine distances for reprojected images. A distance component adjustment is made for objects appearing in the scene, which preferably may be applied to the pixels. For example, the pixels may be manipulated to be moved to more particularly represent a camera image of the scene as if taken from the initial point of view (even though the camera has been moved). Kalman filtering may be used to smooth trajectory of the movements, which may smooth the look direction, so as to concentrate the movements to reduce deviations from noise or other inaccuracies. The image manipulations may be processed in conjunction with an applied Kalman filtering so image manipulations are applied to movement information that preferably represents movement data that is optimized for the camera movements detected by the IMU. According to some alternate embodiments, the IMU data may determine movement, position, and/or orientation at a particular time and relative to a designated look direction,

[0072] According to an exemplary embodiment, a system and device for producing macro stabilized video output stream from a camera that is subject to movements (e.g. , such as position and/or orientation) is provided having a camera configured according to the depictions herein. According to this Example, the camera lens moves with the camera and an IMU (e.g. , movement/position/orientation sensor) ascertains information as to the position/orientation (i. e. , movements) of the camera, including relative movement thereof from a previous position, or a state, where the same position is maintained over the ascertainment period. The position data may be ascertained in a unit time, such as, for example, number of position data sets obtained per second. The system and device preferably capture video images wherein the scene which comprises objects in the field of view are represented as having a position based on a three component vector, which is at a particular time (relating to the image capture time for that portion or frame of video). The image space may be represented by the sensor field, which is made up of pixels. In the image space, for example, such as the camera image sensor field, an image position may be represented by two coordinates. Preferably, the coordinates are homogeneous coordinates. For example, in each component vector, namely, each world coordinate vector and each image coordinate vector, a component vector having a unit value is appended, so that the coordinate vectors for the image position may be [x y 1], and coordinates for the world position may be [X Y Z 1]. The captured image data (pixel values) are recorded. According to some embodiments, the camera may be undergoing movement (other than rotational movement), and therefore, the coordinate system may be changing in conjunction with corresponding movement. However, according to some embodiments, the system and device are configured to manipulate the data to generate a manipulated or synthesized image, and may proceed as if it is starting all over again at each frame and utilize that simple coordinate system (such as the coordinate system referred to in Fig. 3 A) for translational movement. According to preferred embodiments, camera rotations are tracked and recorded, while the system and device may ignore the translational motion because translational motion (e.g. , such as moving the camera in a forward direction to follow a subject) actually may be desired. Although some translational motion may comprise lateral motion, that type of motion may not be perceived as "instability" compared with the rotational motion. The IMU provides position data corresponding to the time at which the image data for the image coordinates was captured or obtained. The system and device are configured to generate a relation for the image coordinates with regard to the world coordinates (i.e. , the coordinates of objects in the scene). A matrix component K is applied to relate the coordinates, such that the matrix component K may be applied to the world coordinates in order to produce a linear matrix multiplier. This matrix multiplier preferably is dependent upon intrinsic camera parameters, such as, for example, focal length and scaling constants relating to physical dimensions and pixels. Additional multiplicative matrices may be applied to relate the image matrix coordinates at a particular time to a corresponding movement at that time. For example, where the IMU ascertains rotation information as to a rotation of the camera, the rotation information may be applied to designate camera movement by way of a rotation matrix. Accumulated rotations of the camera at time t may be captured by a rotation matrix, R(t). In addition to adjustments made by application of the linear matrix constant to adjust for intrinsic parameters, a movement component is applied to relate the movement of the camera. In this example, one component of the camera movement comprises rotational movement or a rotation. The IMU provides the camera rotation data and that camera rotation data is associated with a time. The system and device are configured to generate a projected image location, which may be a synthesized view point, which represents a given world location for an object or scene captured at a particular time (t). In the example, where the camera movement is rotation, the device and system preferably may utilize the generated accumulated rotations and apply the rotation matrix R(t) to the matrix component K (first matrix multiplier Ml) so that the rotation matrix R(Y) provides a second matrix multiplier (M ), and where the first and second multipliers (Ml and M2) are applied to the matrix world coordinates to generate a projected image. The projected image preferably may be represented as manipulated data, and a relationship may be established between the image coordinates (image vector), and the world coordinates (world vector) based on the camera movement and the application of one or more movement components, such as a rotation matrix R(t) in this example, as well as the intrinsic camera matrix K. One or more additional movement components may be implemented by the device and system to generate the relationship for a projected image. The other movement components, for example, may be matrices, including, for example, one or more additional matrix multipliers, M3, M4 . . . Mn. According to other example, the one or more additional movement components may be applied by the device and system to reflect one or more other movements of the camera. Corrections, such as compensation for rolling shutter may be implemented by the device and system, where a different movement component, such as a rotation matrix, applies for each row of the captured image (versus each frame).

[0073] Motion stabilized video may be generated from relating the image for the video frame or frame portion captured for a time. The system and device may be configured to coordinate camera movement, as indicated by the IMU or other motion sensing components, with the image captured. One or more components are applied to the image information, based on the relationship between the world coordinates and image coordinates. A synthetic camera may be generated so that an image may be produced having a look direction or synthesized viewpoint which represents a viewpoint that is desired or designated. According to preferred embodiments, the system and device generate a synthesized look direction where the resulting field of view substantially overlaps the one that was recorded. The system preferably is configured to synthesize a virtual camera and re-point the virtual camera to capture a designated target or subject. According to embodiments, the system generates a synthetic camera and produces images from a selected or designated viewpoint, which, may correspond with a viewpoint that the physical camera never actually held. One example of the implementation is the generation of an image from the synthetic camera, where the image may be produced having a look direction or synthesized viewpoint which represents a viewpoint that is desired {e.g., such as directed toward a target or desired direction). According to some implementations, the look direction or synthesized viewpoint may maintain a desired look direction at a particular position (e.g. , a position in front of the camera or the camera location). According to some other

implementations, the look direction or synthesized viewpoint may maintain a desired look direction which may be relative to a camera position (such as an original camera position, or position of the camera within which an event, e.g., sound, flash, etc., occurs). The system may generate video images that are taken from a specified viewpoint, and may involve a viewpoint that is different than the viewpoint that the camera was actually pointing. The video images may be taken from a designated viewpoint, which, for instances where the camera is located to point in the designated direction, will utilize that frame, and for instances where the camera is pointing in a direction other than the designated look direction will utilize a synthetic camera image frame that has the look direction of the designated viewpoint.

[0074] Where a selected or designated viewpoint or look direction is assigned, movements of the camera relative to the designated viewpoint may be related to one or more prior camera positions from which the synthetic viewpoint may be maintained. For example, according to some embodiments, where a prior camera position is used from which already applied synthesized camera viewpoint adjustments have been made, the designated viewpoint may be maintained by making further adjustments to the already adjusted image information. Further manipulating of the image information based on the change in image position relative to the previous position may be implemented to provide the viewpoint from the synthetic camera. Preferably, the synthetic camera viewpoint is within a field of view that substantially overlaps the field of view recorded with the camera {e.g., on the camera image sensor).

[0075] In order to facilitate and minimize processing power and consumption, the adjustment components (such as the multiplier matrices) may be combined together and applied (as opposed to being sequentially applied). The synthetic camera may be generated for each camera movement, and the image may be stabilized by producing the video stream output that is manipulated to be from a desired or assigned point of view. The processing of the video images captured preferably is done at a high rate and is coordinated with the information obtained by the sensors, including the camera image sensor, the IMU and other sensors (microphone), temperature, etc. The device preferably is configured to record the raw video captured, and may stream live motion stabilized video.

[0076] Multiple independent or interdependent foci can be maintained to create the appearance of many distinct synthetic cameras or sets thereof. According to embodiments, the system may be configured with multiple unrelated cameras, which may be different locations, including in widely dispersed geographic locations - lacking any pairwise overlapping FOVs. According to some embodiments, the FOA process may be implemented for each of the remotely situated cameras. A large number of application-dependent options are available for realizing this process. According to some embodiments, the FOA may be directly controlled. This may be accomplished by external inputs or requests communicated to the device 110 or camera, or may be generated and updated autonomously via a local processor; combinations where either of the two choices is used to assist or guide the other are also viable and may be implemented in conjunction with the system, method and devices. External requests may be manually directed by humans, driven by automation, or both. Requests may specify a fixed target direction (z. e. a fixed point at infinite distance, regardless of camera movement), or a fixed direction relative to the camera (e.g. directly in front, again regardless of camera movement). Alternately, the FOA may be adjusted at any time under external or program control in order to compensate for actual or predicted camera motion, to simulate camera motion, to filter or smooth camera motion, or to follow a desired target object. Fig. 2 shows the FOA process 215 making decisions based not only on external controls 216, but also knowledge of the motion of the system components 226 (typically each camera). The system and devices are configured to generate a focus of attention based on inputs from the image and other data utilized in connection with the sensor fusion 210. According to some embodiments optional motion processing of the information may be carried out on the sensor fusion information, (e.g. , fused video or sensor- fused video). The motion processing preferably may be carried out for camera applications where a synthesized viewpoint must be achieved in the presence of camera motion. Also indicated in the diagram of Fig. 2 is the possibility of having the FOA process examine raw, partially-corrected, reprojected, or final image data 226 in order to guide viewpoint selection. The data examination capability may be implemented for achieving automatic or semi-autonomous target detection, acquisition, and tracking.

[0077] Fig. 2 depicts motion processing 227, which is an optional process. Although motion processing 227 is depicted as a separate process item, according to some embodiments, it may be subsumed by the focus of attention (FOA) 215 mechanism. Motion processing 227 simplifies the FOA implementation by assuming responsibility for operations related to the processing of fused motion sensor information. In Fig. 2, an exemplary illustration of fused sensor information 228 is depicted in conjunction with motion processing 227. Some examples of motion processing 227 include complex mathematical operations (e.g., Kalman filtering) to improve the quality of IMU data by combining fused measurements from multiple sensors with potentially mixed modalities, and deriving a smoothed or otherwise more desirable trajectory for any single camera or the overall sensor platform from the actual motion - the latter typically representing a stabilized motion that would subsequently lead to a correspondingly stabilized synthesized camera image stream. Kalman filtering may also be applied to create a smoothed virtual trajectory from actual motion, and for choosing a smoothed look direction (FOA) in some modes of operation.

[0078] Fig. 5 is an illustration of another exemplary camera embodiment illustrating an implementation where a device 410 comprising a single camera is configured to image video and to produce stabilized video stream. The embodiment illustrated in Fig. 5 is similar to the devices and components discussed and illustrated herein and in connection with Fig. 1 , and the other figures. In Fig. 5, the reference to "synchronization" 209 is made instead of "sensor fusion" 210 (referenced in Fig. 2), the embodiment depicted in Fig. 5 representing an implementation where data is obtained is from particular components including a single fisheye lens 122f. The device 410 may include or be associated with one or more IMUs. As with the single camera device implementation depicted in Fig. 5, and the system implementation where a plurality of cameras are configured to produce a stabilized video stream (or streams), a plurality of IMU' s may be associated with each camera to provide information about camera movement, including position and orientation (as well as acceleration and other movement detection data). [0079] According to some embodiments, the system, method and devices may be configured to utilize one or more cameras (or lens options). One preferred embodiment is configured to use two cameras, one camera is configured with a fisheye lens, and the other camera preferably is a standard or conventional field of view camera that is designated for other views, such as, for use for close-ups. The cameras may provide separate independent video imaging, and video streams, may be switched to provide one or the other, may record both streams (including other information such as raw data), and may transmit one or both camera streams. According to preferred embodiments, the cameras may be configured to save power by operating one camera at a time. For example, the standard or conventional camera typically requires less processing and less power than a wider field, fisheye lens camera. This may be directed by the device itself, wherein one or more of the two cameras may be able to detect event information and, based on the event information, operate the wide field camera or the close up camera. Event information may include any of the information provided by the sensors, and may include manual inputs that are received by one or more of the cameras (which may be transmitted remotely to the camera, or may be actuated using the camera or camera controls). The streams generated by the cameras are stabilized video streams, and may be generated by utilizing the information, such as the sensor data and image data. Preferably one or more virtual cameras, with a focus of attention and field of view, as discussed herein, are generated. The image is manipulated to produce a stabilized image (video stream). In the case of the fisheye lens camera, intrinsic corrections to the image information are made to remove the distortion (or other effects of a lens), as discussed herein, by flattening, unwarping or unwrapping. The stabilized video from each camera is generated through the virtual camera information being used to construct the stream. The device preferably the cameras are configured to process their respective data streams. Alternatively, the cameras may comprise one or more separate lenses and one or more separate image sensors that are configured to operate using device components (e.g., processing components) of a single device.

[0080] Although the devices 1 10 (Fig. 1) and 510 (Fig. 6) are illustrated as preferred

embodiments shown as part of the system and discussed in conjunction with the method for providing stabilization of images, the system and methods disclosed herein may be implemented in connection with alternatively configured devices. Although embodiments of the invention may be implemented using circuitry and components typically used in a mobile phone, the invention may be implemented in other devices, including the mobile video cameras shown and discussed herein. The stabilization mechanism may be configured to provide high detailed high resolution video from a mobile video camera, and may transmit live video from the field to a remote location. [0081] Referring to Fig. 6, there is illustrated an exemplary embodiment of an image capture device 510 configured having the image stabilization mechanism described herein. The capture device 510 is depicted in an embodiment as a video camera that includes a housing 51 1 an image sensor 523 and circuitry 529 for capturing video from the image sensor 523 and from the other sensors of the device 510, including the IMU 516. According to a preferred embodiment, the device or camera 510 preferably includes one or more components for recording audio (such as the sensor or microphone 121 represented in Fig. 1). The device or camera image sensor 523 preferably is a single high-resolution sensor with very high resolution (currently at least 8 megapixels) and with a wide field of view (FOV) in two axes. The device 510 preferably includes a capture objective or lens 530, and according to preferred embodiments, the lens 530 is a fisheye lens. The image capture device 510 may be referred to as a camera (e.g., video camera). The image capture device 510 may be used while in motion itself to capture the image of a scene, a moving subject within a scene, or a number of moving subjects. The device 510 preferably carries out image capture in a continuous manner and provides video, which may comprise a unit number of frames per time period, such as, for example, frames per second. The image capture device 510 may be employed to capture a variety of moving subjects, such as, for example, individuals, vehicles, animals, as well as other objects. The IMU preferably may be any suitable IMU, including any of the IMU's discussed and depicted herein. According to preferred embodiments the IMU comprises a component that provides three-axis, real-time IMU data including the orientation and motion of the camera 510. The camera 510 is configured with circuitry so that the camera motion (including position and orientation) preferably is captured simultaneously with camera video/audio (and in some instances, other sensor information provided by other sensors or input components).

[0082] The camera 510 is provided with circuitry to store and communicate the image data and other sensor data outputs. The camera 510 preferably includes circuitry and preferably transmission components to produce and communicate real-time output comprising one or more video streams. Preferably, the one or more output video streams each have 720p resolution (or greater). Embodiments of the device 510 may be configured to produce a FOV, suitable for wireless transmission, which may comprise a field of view (FOV) that is more typical or standard than the wide-angle original image. The device 510 includes a storage component 525 onto which raw video and IMU data are stored locally, on the device. Although the device 510 may communicate real-time, live video streams from the device (e.g., directly), in instances where the device 510 is operating in a location with insufficient bandwidth for real-time transmission, the stored information may be transmitted, or transmission may resume when suitable

communication is available (with the stored video being uploaded later or streamed at that time). The device 510 storage of raw data from the sensors, including camera image sensors and other device sensors (IMU) are stored as time stamped data, and real-time outputs can be reproduced at any later time from this information. For example, raw products, or alternative products, such as, for example, one or more synthesized cameras and their respective viewpoints or

combinations thereof, may be produced from the stored raw data.

[0083] Referring to Fig. 6 where a device 510 for capturing and streaming video is illustrated according to an exemplary embodiment, with a number of the device components represented schematically. According to a preferred embodiment, the image capture device 510 has an image sensor 523 that has a high operating resolution, such as for example, 3.5K or 4K. The device 510 also includes a processing component, such as the processor 511 which is arranged connected with other components, such as, volatile memory 513, storage 525, an optional GPU 514, a clock 512, and IMU 516 and microphone 521. A power supply, such as, for example, a rechargeable battery 550, also is provided. The IMU 516 provides information that identifies the exact position and orientation of the image capture element, which in this embodiment is the device 510, where the lens 530 is fixed relative to the device body 511.

[0084] The device 510 includes a processing component and software containing instructions to instruct the processing component to carry out manipulations of the sensor information. The data from the image sensor and IMU (as well as other sensors) of the device 510 or associated with the device 510, preferably are processed to provide a video stream that may be viewed on a suitable display. The display may be provided on the device 510, or remote from the device 510 at a location to which the video is communicated from the device through a suitable network. The video produced and transmitted by the device preferably is motion stabilized, high quality video. The device 510 is configured with instructions to process the information from the sensors by associating each data item with a timestamp. The sensors may be configured to record or ascertain information at a suitable time interval, such as, in the case of video a number of frames per second, (or minute or other time interval), and in the case of the IMU every fraction of a second (or when a change from a stationary position is detected). The sensor information preferably is synchronized so that the processing of a video frame or stream coordinates the sensor information inputs at a point in time. The IMU and video preferably are registered, either by design or through a calibration process. The device 510 may be configured with a suitable calibration routine where components, such as, for example, the IMU and camera sensor ascertain data, and the data is related against one or more known or measurable condition. The information obtained by the device or camera 510 preferably is processed for intrinsic correction, which may include processing the data to adjust the information parameters, such as, for example, using use lens/camera calibration parameters to correct for static deviations, including fisheye distortion. The device 510 also is configured to collect and store data, and provide images that are adjusted for extrinsic conditions. For example, the device 510 may include software that has instructions to direct a processing component to manipulate the data (such as, for example, the fused sensor data, see Fig. 2, 210) to produce extrinsic corrections (see e.g. , Fig. 2, 213), which for example, may include compensation for rolling shutter using IMU data and video data. The device 510 preferably is configured to manipulate the information captured by the image sensor 523 and other sensors to reduce the effects of motion blur. The device stabilization mechanism may implement the processing of the data to provide corrected or adjusted data that may be used to produce motion stabilized video from a single device 510 or from a plurality of devices 510. In the case of the single device 510, one or more synthesized viewpoints are generated from the information.

[0085] According to preferred embodiments, the synthesized viewpoints are generated from processing the information, preferably with the device processing component, to change the optical axis location by designating an optical axis at a different location, which may be a repositioned optical axis (if the camera was proceeding pursuant to a previous optical axis). The designated optical axis may be a designated location that is within the FOV of the camera. For example, the optical axis may be changed or repositioned (or otherwise designated) to point anywhere within the physical FOV of the camera. The designation of a different or new optical axis (or setting of an initial optical axis) provides a synthesized camera operating with a FOV of the designated optical axis. The device 510 is configured with instructions for manipulating the data by reprojecting the high-resolution data (combined with manipulation of the data for unwarping, e.g. , of fish-eye, or wide field distortion) to produce a smaller FOV. The smaller field of view (FOV) generated by the device 510 preferably is produced having a more suitable resolution for the available transmission bandwidth. The device processing manipulations, according to some embodiments, may be configured to produce an image resolution based on the bandwidth available. The device 510 although a single camera preferably generates one or more, and preferably multiple synthetic cameras. The multiple synthetic cameras may be utilized to provide video image streams from the respective multiple viewpoints available within the field of view. The image may also be manipulated to enhance the image by cropping the image to remove potential unfilled frame portions.

[0086] The device 510 stabilization mechanism, as with the stabilization mechanism illustrated and discussed herein in connection with the device 110 (see e.g., Figs. 1 and 2) preferably manipulates the image and video (e.g., video image frames or streams) to provide enhanced video for viewing and displaying. The stabilization mechanism preferably has a component configured to evaluate the IMU motion history to keep track of orientation (device camera orientation). The device is configured with instructions that instruct the processing component to conduct a comparison of position/orientation (sensed at a point in time) against desired position/orientation. If there is no difference determined, then the processing concludes that the result is that the movement has not taken place or if it has it is not appreciable. The

determination of no difference or no appreciable difference therefore may correspond with a result that the camera is pointing in the correct or desired direction. However, the IMU preferably does and continues to track changes so that the orientation/position is integrated and derived from these changes. In other words, even where there is no change in camera position at a particular time, the position/orientation information for that time is recorded and used for determining subsequent relative movement.

[0087] The device may be configured to provide thresholds for movement that would be a triggering movement for which an orientation change has occurred. The difference between the orientation data between one orientation value, such as a first orientation value (OV1), and a second orientation value (OV2) (which may be a next successive orientation value), is determined and that difference between values (OV2-OV1) may be utilized by the device to synthesize the desired viewpoint in realtime. Although discussed as orientation values, these values also may include camera position, geographic specific location, as well as other information. A GPS component or chip of the camera, or associated with the camera, preferably may provide geo-specific location information. For example, the orientation values may include spatial coordinate data, (x,y,z) coordinates as well as one or more angular components, to determine spatial movement of the camera. Preferably, translational movement, which may occur when the camera is moving, may not be included in the stabilization mechanism. Some alternate embodiments may employ a translation, but for most embodiments, the translation may be provided by adjunct information comprising a data parameter of geolocation (which may be provided by one or more of the device components or circuitry).

[0088] The device preferably images and stabilizes the video in realtime to generate a real-time video stream. A system comprising a plurality of devices, or devices with a plurality of cameras (or lenses) also may provide motion stabilized video. The random movements of the device or camera are adjusted to provide an enhanced video. For example, movements of an individual, such as, for example, a law enforcement officer wearing the device configured as a body camera, agitate the camera. The camera therefore experiences undesired movements, which are for the most part random movements. The synthesis of the video stream captured preferably is made from the image information captured through a fisheye lens. Extreme movements of the camera, e.g., pointing too high or too low, may still capture the target object within the field of view. The camera movement is sensed by the IMU (and other sensors) which handle rotations in all three axes. The image information and sensor data also is processed so as to streamline and optimize efficiency. For example, manipulations of the data for unwrapping the images, stabilizing the images, adjusting the images for rolling shutter compensation, carrying out viewpoint synthesis, as well as generating a video stream with the adjustments applied thereto, may be efficiently carried out by a suitably configured device, such as those devices (110, 510). Preferably, device processing components and stored instructions provided on the software contained on the device implement the image manipulations from the data to produce a stabilized video output (such as, for example, a live stream of stabilized video).

[0089] The device 510 is configured to generate a target or focus of attention (FOA). For example, when a device is configured as a law enforcement body camera, preferably the device implements a FOA determination that may be based on an estimated law enforcement personnel view— what the individual is looking at based on the individual's motion. The device is configured to sense the motion, and the IMU provided information, which preferably may be continuously monitored by the processing component of the device. The information may be stored, and, in some embodiments, may be ascertained as a unit number of position information per interval, such as samplings per second. According to other embodiments, the maximum position data is captured and stored. Preferably the information is stored with a time stamp. The processing component may monitor the sensor data to determine when a threshold movement has occurred. For example, the device may be configured to detect deliberate sudden turning motion for low-latency changes and differentiate this movement from the random sudden abrupt movements for which correction or adjustment of the video or image is beneficial. The device preferably is configured to follow the intended FOA in order to provide a desired field of view, by evaluating and comparing the position or movement information at subsequent time intervals. The device preferably processes the movement or other position data rapidly so as to make a rapid determination of whether to maintain or adjust the designated field of view. The stabilization mechanism preferably is implemented by the device to adjust the video stream from the captured images (or video) from the camera. The device preferably adjusts the video to stabilize the video by manipulating the frame capture field of view and provide a virtual captured image during that interval (however brief or long) of motion turbulence that is an adjusted actual camera image. The image preferably is adjusted for turbulent shaking motion, such as that produced by a camera user running with the camera. According to preferred embodiments, the image and video captured by the device also may be adjusted for horizontal/vertical orientation. Embodiments also may provide remote operations of these features.

[0090] The device may be utilized in connection with other installations where, for example, the camera is used or installed in a fixed orientation, for example, where the supposedly fixed camera may be subject to external forces like wind. Embodiments of the device may be configured to have movement capabilities where the lens may move (switch between one or more lenses), or have an externally directed view (e.g. with a joystick from an associated or remotely situated controller) to look around or to move the device or its lens element to follow a target.

[0091] Embodiments of the system, methods and devices preferably may be configured to allow external control and networking allows coordination across multiple devices (or cameras). This may be implemented for increased situational awareness across one or more locations, or for optimizing transmission bandwidth by controlling the particular device and view being captured, based on its location, orientation, or other attribute.

[0092] According to preferred embodiments, the device is configured as a camera to record and generate motion stabilized video from a camera being agitated. The camera image, when shaking, is moving around. Although the camera has a center for the lens and direction in which it points (see e.g., OA and OA' in Figs. 3 A and 4A, respectively), the orientation of the so called look direction changes. Motion sensors of the device are configured to keep track of the changes and note the relative movement, i. e. , a change in direction. The movement also may be determined from a particular location as well as a relative change. The motion sensors tracking the movement may be used to maintain a synthesized look direction. However, there is more distortion that needs correction at the edges of the fisheye viewing field. The synthesized look direction provides a focus of attention. The corrections are discussed herein and an example is illustrated in Figs. 7A,7B and 8A,8B.

[0093] Referring to Figs. 7A-9B, views depicting a scene are illustrated to demonstrate some potential stabilization enhancements. Fig. 7B shows an image taken with an image capture device configured according to the embodiments of the invention. The images in Figs. 7A,7B and 8A,8B are frames of a captured video stream. The image capture device is configured as a camera to record video. Fig. 7A depicts a standard image on the left (from a video frame), taken with a standard camera. The image (from a video frame) on the right (Fig. 7B) illustrates an image recorded with an embodiment of the camera configured according to the invention, such as, for example, the camera 510. In Figs. 7A and 7B, each image represents a frame of video taken at the same time. The standard camera and stabilization camera 510 each images the scene through a fisheye lens, respectively provided on each camera. In the image of Fig. 7A, the fisheye view distorts the scene. The image of Fig. 7B shows the scene with minimal or no distortion. The respective cameras that image the scenes shown in Figs. 7A and 7B were moved by shaking and rotating while continuing to record video of the scene. Figs. 8A and 8B illustrate the scene taken with the respective cameras under the conditions of movement, which in this example, involve shaking and movement. An indicator of the movement is provided, and as shown in Figs. 8A and 8B, the conditions are roll of 1.33, pitch of 29.48 and yaw of 4.10. As compared with the camera conditions of Figs. 7 A and 7B, there is significantly increased movement of the cameras when imaging the scene as depicted in the images shown in Figs. 8A and 8B. In the scenes illustrated in Figs. 7A,7B and Figs. 8A,8B, the respective cameras are imaging the scene from a particular location. Alternatively, the scene may be imaged where the cameras are changing their location, such as, for example, to follow a moving subject. An exemplary device, such as the camera 510 illustrates the captured images in Figs. 7B and 8B, and, though subjected to the same movement conditions as the standard camera providing the respective scene images of Figs. 7A and 8 A, generates an image that exhibits stabilization of the scene. The scene images shown in Figs. 7A and 8A show significant departure from the positioning, although the subject has remained substantially or entirely static. The movements of the respective cameras are minimized in Figs. 7B and 8B to provide a stabilized image frame, whereas the standard camera shows movement within the scene frame. The frames depicted are frames taken from a captured video stream, and represent the stabilization in the images generated in Figs. 7B and 8B. Preferably, the stabilization is generated for the video stream and the camera produces stabilized video. According to preferred embodiments, not only has the undesired motion been "removed" (stabilized) from the video generated, but also the severe distortion of the fisheye lens has been removed. Although an example of individual scene frames are depicted in Figs. 7A,7B and 8A,8B, movement of the standard camera would be exhibited as shaking in the other frames captured while the camera is undergoing movement, and in the video generated by the displaying of the standard camera captured frames. The present camera, such as, for example, the camera 510, is configured to image the scene from the direction of the camera lens. According to some preferred embodiments a wide field lens is used to capture more of a scene from the same camera location or viewpoint (e.g., to provide an expanded field of view). Alternate embodiments may utilize lenses having fields of view other than that depicted in the exemplary images (e.g., standard, or zoom). According to the embodiment illustrated, the capture device 510 is configured with a fisheye lens. The capture device 510 preferably includes circuitry for controlling the operations of the device 510. The circuitry includes a power supply 550, at least one image sensor 523 and may include one or more other sensors. .

[0094] Referring to Figs. 9A and 9B, a scene is depicted to illustrate compensating for extrinsic distortion due to camera motion and rolling shutter distortion. In the scene depiction of Figs. 9A and 9B, there is no fisheye distortion or stabilization attempted. Figs. 9A and 9B represent the process block 213 of Fig. 5 as an example of extrinsic adjustments that may be made. As illustrated in Fig. 9A, the scene is depicted and exhibits extrinsic distortion. The extrinsic distortion in this example is due to rolling-shutter effect. (Although the entire image depicted in Fig. 9A is subject to this effect, it is most easily seen in the high degree of "leaning" in the vertical lamppost.) Fig. 9B depicts an enhanced scene with a correction applied to normalize distortion. In both depictions, the scene is imaged with a standard field of view lens. In Fig. 9A the view of the scene illustrates certain objects as having a curvature to them due to rolling shutter. This is the extrinsic distortion due to rolling shutter, not the intrinsic distortion due to a fisheye lens. This video depicted in Figs. 9A and 9B was recorded without a fisheye, and is provided to demonstrate rolling shutter correction. The image on the right, Fig. 9B, is generated with the device but manipulated with the device processing components and instructions to process the image information that was captured to result in the image shown in Fig. 9A.

[0095] According to preferred embodiments, the extrinsic corrections 213 (see Fig. 5) are applied to manipulate the image, and intrinsic corrections 212 (such as the physical characteristics and geometry of the camera, including the lens and sensor) also may be applied to manipulate the image. Preferably, the intrinsic correction or adjustment is made to normalize, or reduce the distortion. The device stabilization mechanism preferably implements stabilization in

conjunction with the distortion adjustment, so that the image is adjusted by manipulating the image data to remove the appearance of undesirable camera motion and to make adjustments for extrinsic and intrinsic distortions. According to a preferred embodiment, the device preferably manipulates the distorted normalized image data to produce a stabilized image. The device image also refers to video.

[0096] As illustrated in the motion stabilized video image frames of Figs. 7B and 8B, the camera device assigns or designates a focus of attention (a look point). The camera device is configured to maintain the look point. When the camera is experiencing turbulence, the camera position is moved along with the look point, so the new camera position (the moved position) has a new look point. The movements of the camera are detected by the IMU (and possibly other motion sensing components). The virtual camera is synthesized and may provide a number of synthetic images or image portions corresponding with the change in camera position or orientation. The image and sensor information are ascertained and stored. The image information also is processed so that as the camera motion takes place, and the designated look point is moved or disrupted, the camera device implements a look point from one or more synthesized virtual cameras. The virtual camera designated look point is used to generate or designate a video portion, such as a frame or portion of a frame, to produce a video that synthesizes from the actual image information and data a corresponding video or portion that provides the designated look point. The process may continue for each camera movement or disruption, and provide an output of stabilized video, which may be a stabilized stream of video. An image recording device according to the invention may be constructed as shown and described herein. The image recording device, such as a camera, may comprise a stabilization mechanism having at least one movement sensor for sensing movement and providing movement data, an image sensor disposed to receive an image thereon, and a hardware processor configured with software containing instructions to process movement information comprising movement data from the motion sensor and image data from the image sensor. The image recording device has a lens and other components and can capture and record video frames. The device, through the device sensors, including one or more IMU's or IMU components, identifies changes in position between successive frame captures from information provided from the IMU or other movement sensors. The changes in position are assigned a first delta which comprises a position change between a first position and a second position. The first position corresponds with a first frame, and the second position corresponds with a second frame. The lens of the device has a corresponding focus of attention and a field of view represented on the sensor. The device generates one or more virtual cameras synthesized from the information. The virtual camera has a first virtual camera focus of attention and a first virtual camera field of view. The device also utilizes the processing components and circuitry with stored instructions containing in device software to instruct the processor to carry out an evaluation of the movement information and determine whether the movement meets a threshold that requires a corrective adjustment. Where corrective adjustment is determined to be required, then the device produces an adjusted video stream of video which includes one or more frames or frame portions from the first virtual camera and has the first virtual camera focus of attention for those one or more frames or frame portions. The device may continue to monitor camera movement or turbulence, and continue to generate synthesized virtual cameras having a desired look direction, even where the camera has moved from the intended (i. e. , desired) or original look direction. From the synthetic camera

information, such as the image frame or portion thereof, the motion stabilized video stream is produced, and continues to be generated by the imaging device. According to some system embodiments, a plurality of physical cameras are utilized from which are produced a plurality of virtual cameras synthesized from the respective camera image information.

[0097] According to some alternate embodiments of the invention, the system may be configured to utilize multiple cameras, such as the camera devices 510 shown and described herein, to increase effective virtual/available FOV and increase the available angles to be synthesized. Embodiments provide camera captures of video, and may store the information in one or more separate databases, as well as a collective database. For example, each camera may be configured to provide stabilized video. According to an alternate embodiment, the plurality of cameras provides for a plurality of fields of view. According to some embodiments, the cameras may be coordinated with each other, or with one or more coordinated, to provide a relative image capture location parameter, so that the images captured by one camera field of view may be related to the FOV of another or any other camera. The cameras also preferably are configured to generate stabilized video (or images) and may generate the stabilized video from a plurality of video streams from one or more virtual synthesized cameras synthesized from one or the plurality of cameras.

[0098] Embodiments shown and described herein, although depicted as separate processes may be combined together in some instances where efficiencies or optimizations can be improved.

[0099] According to some embodiments, the imaging system includes a plurality of cameras that image the area surrounding the camera location, which may be an individual, in the case of the camera being worn as a body camera, or object where the camera is carried on an object such as a vehicle. The plurality of cameras preferably are recording, and the recording includes one or more, or a continuum of sequences that takes place at the same time. In this regard, information imaged from a plurality of directions may be captured. The imaging also may be captured where the cameras (or one or more of them) are shaking, and the subject is in motion. In addition, the depiction of the optical axis in Figs. 3 A,3B and 4A,4B considers rotation of the camera, but does not depict translation.

[0100] These and other advantages may be obtained through the use of the inventive system, devices and methods disclosed herein. While the invention has been described with reference to specific embodiments, the description is illustrative and is not to be construed as limiting the scope of the invention. Various modifications and changes may occur to those skilled in the art without departing from the spirit and scope of the invention described herein and as set forth in the appended claims. For example, the processing of image information may include steps to eliminate or reduce distortion, correct barrel distortion, or adjust a horizon, such as leveling it, as well as adjusting perspective distortion through manipulation of a vanishing point or other image data. Components, such as the devices 110, 410, 510 and cameras 122, depict exemplary embodiments for carrying out the method, and comprising a system for producing stabilized video streams. The devices and cameras may include global positioning system components, such as GPS location chips, and GHPS location data may be part of a device (or camera) data profile (along with other information as to position, orientation and movement). The features disclosed and shown herein in connection with embodiments may be applied to one or more other embodiments, and one or more features may be combined or provided together. The image may be adjusted to provide an isometric viewpoint, but have infinite or increased zoom capability.