Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
LIVE-ACTION CAMERA, CONTROL, CAPTURE, ROUTING, PROCESSING, AND BROADCAST SYSTEM AND METHOD
Document Type and Number:
WIPO Patent Application WO/2021/151205
Kind Code:
A1
Abstract:
A system and method for creating alive-action camera video capture, control, video network, and broadcast production, preferably comprising at least one live-action cameras for capturing the video, a Wi-Fi 6 based wireless network and an artificial intelligence neural network image processing system for preparing the video for broadcast.

Inventors:
SHIELDS GARY (CA)
Application Number:
PCT/CA2021/050100
Publication Date:
August 05, 2021
Filing Date:
January 29, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
D SERRUYA CONSULTING LTD (CA)
International Classes:
H04N5/222; H04N5/262; H04W4/00
Domestic Patent References:
WO2019128592A12019-07-04
WO2018213481A12018-11-22
Foreign References:
CN108260023A2018-07-06
CN109660817A2019-04-19
CN112073739A2020-12-11
Attorney, Agent or Firm:
ANDREWS ROBICHAUD PC (CA)
Download PDF:
Claims:
CLAIMS

1. A method for creating a live-action camera video capture, control, video network, and broadcast production system comprising: at least one live-action cameras for capturing the video; a Wi-Fi 6 based wireless network; and an artificial intelligence neural network image processing system for preparing the video for broadcast.

2. A system for creating a live-action camera video capture, control, video network, and broadcast production system comprising: at least one live-action cameras for capturing the video; a Wi-Fi 6 based wireless network; and an artificial intelligence neural network image processing system for preparing the video for broadcast.

Description:
LIVE-ACTION CAMERA, CONTROL, CAPTURE, ROUTING, PROCESSING, AND BROADCAST SYSTEM AND METHOD

Field of the Invention:

The present invention relates to a novel system of live-action cameras which produce professional -quality video and the means to control them, capture their video output, route their video to a production system for processing and broadcasting or streaming the live video over TV networks or streaming to remote, or local viewers at the event.

Description of the Related Art:

In the world of live-action capture of sports, there are many different types of equipment. The field is divided into mutually exclusive options. Smaller less-expensive equipment that does not provide the professional-quality video of normal professional TV broadcasting and professional equipment that is expensive and too bulky to be located on the participants for generating a first-person point-of-view.

The cameras utilized for first-person perspective live-action capture by necessity require the wireless transfer of video to the broadcast equipment. This radio link tends to have limited capacity to transfer video at higher bitrates or the ability to support multiple cameras simultaneously over the same limited radio spectrum.

Another significant issue with these radio systems is the problem of multi-path reflections. These are the reflected radio waves from obstacles and other surrounding objects which cause multiple copies of the transmission to arrive at the receiver at different times. These reflections interfere with the original transmission and degrade system performance. A system that attempts to solve the radio program is described in Pat. No: US 9,826,013 to McLennan and Harish. Their art describes a solution that uses the unlicensed spectrum in the 2.4 GHz and possibly the 5 GHz ranges. While they have made efforts to increase the capacity for simultaneous wireless communications by reducing the bitrate for some camera groups based on network congestion and poor link speeds, the solution is limited by the underlying radio technology. Such an approach only serves to reduce video quality in an attempt to maintain the same limited number of live video streams.

With conventional Wi-Fi technology, especially in the 2.4 GHz band, there are a limited number of channels available. Furthermore, there are only three channels that don’t overlap in frequency and therefore interfere with each other. Each channel is fixed in bandwidth and every transmitted radio packet eliminates that channel from use by other clients, even if the information being transmitted doesn’t require the entire bandwidth available on that channel. Because of these significant limitations, the maximum practical number of simultaneous video transmissions over such a system would be approximately ten live HD video streams.

Beyond the physical limitations of this transfer medium, there is also the problem that standard Wi-Fi was designed to deliver the maximum performance when downloading information to multiple clients. The communication protocols in use were not designed to effectively handle multiple sources uploading large volumes of video information to a single receiver location. In other words, standard Wi-Fi systems are designed for point-to- multipoint transmissions, and not multipoint-to-point transmissions, which is exactly the situation that you have in a distributed live-action camera system. The McLennan and Harish system also use the Wi-Fi communications just in the short-range from the edge of the field to the camera. From the receiver (access node), the video is sent by wired network connections to the server for processing. This limits such a solution to deployment in small areas where a wired network connection is available.

This limitation of supporting multiple cameras becomes further compounded when transmitted over the already crowded unlicensed radio spectrum, which by design requires that the airwaves and transmission time be shared with other devices also using the unlicensed spectrum.

Any attempt to overcome this wired local network approach with a Wi-Fi based mesh network would fail as such a network is constrained by standard Wi-Fi protocols which require that each node listen to and capture the packet, and then retransmit it to the next node. Only one node can talk at a time as all nodes much be on the same Wi-Fi channel which only supports one user at a time. This means that after only three hops on this network, the effective data throughput would be l/8 th the total network speed of any one channel, a situation that is unusable for widescale live video capture and routing.

Other off the shelf radio solutions that use licensed frequency bands are also available. Due to the fact that these systems are using a licensed radio spectrum, they are limited in where they can be deployed as these bands are licensed differently all over the world. These licensed bands tend to have more powerful transmitters and were designed to send signals over longer distances.

The large distances between where the action is taking place and the broadcasting equipment are located sometimes poses challenges as it is often difficult to relay the video signal reliably over these longer distances with only a single radio pair. More powerful transmitters mean more multipath reflections that degrade the system's reliability and overall performance. There can be many intervening obstacles, and frequently the video data has to be moved off to physical wired networks to transfer it effectively to the broadcast equipment.

An additional challenge of live-event capture is that the event by necessity is captured using a combination of various camera types. Some of these cameras are connected to the broadcast system directly by video cables, some connect over wired network connections, and some are connected by wireless radio transmission to a nearby receiver which then sends the video over a high-speed wired or fiberoptic network to the broadcast equipment.

This collection of different camera types, with multiple video output qualities, and communicating in various formats and transmission protocols, creates a complicated combination of different technologies and image qualities, all talking to the broadcast equipment through different mechanisms.

This combination of diverse equipment types doesn’t permit any coordinated form of automated control over them and provides challenges with multiple connections and connection types. As such, the varied combination of camera types and connections doesn’t lend themselves to being used for anything more than an independent collection of cameras switched into use one at a time by an operator of the broadcasting equipment for a specialized video to supplement the main video feed

Some of the different camera types employed in live-action broadcasting include specialty cameras capable of new formats and viewing angles. Such a camera type is the 360-degree spherical view camera, where a single camera housing contains multiple cameras, the outputs of which are combined into a 360-degree spherical projected image.

One such system is described in patent No. US 8,902,322 B2 to Ramsay, Mills, Horvath, and Bodaly where they teach of a 360-degree spherical camera where the cameras are arranged in a notional tetrahedron. In their system, the captured images are processed by the onboard processor to form a spherically projected image, which then forms a single frame of video information.

The biggest issue with such an approach is the limited computing power that is available in a small camera such as this. While it is common and relatively easy to join multiple images together in a process known as image stitching, the problem that occurs is when a subject is closer to the camera itself, as would be the case when it is used in close proximity to participants in a sporting event.

When the subject gets close to the camera you run into the problem of parallax distortion, that is, the perspective of the object is significantly different from the perspective of each lens. When you try to stitch these images together, you encounter ghost image fragments from each of the lenses in the final image and there is no smooth transition from one image to the next.

Objects in the distance can be stitched together to form convincing mosaics, while closer objects are more problematic with such simple image merging strategies. A more sophisticated process that requires more computing power is impractical in a small battery- powered camera.

In patent No. US 2015/0256808 A1 to MacMillan, Newman, Chowdry, and Campbell a 360-spherical video system is described that utilizes only two super-wide angle lenses positioned opposite to each other on the same optical axis. Such a system produces considerable distortion and degraded image quality at the extreme edges of the captured image as is the nature of fisheye lenses.

To compensate for this, image sensors with higher resolution are required which provides more information to process in an attempt to overcome the extreme distortion at the edge of the image frame. These higher resolution images, in turn, generate much larger video files and pose a significant wireless transmission problem.

Additionally, the extra processing power needed to turn such extreme and distorted images into useable video far exceeds what can be accommodated in a small battery- powered camera and the complicated processing is done offline in post-processing software rather than in real-time; a step which precludes the use of such devices in a live broadcast scenario.

Another common style of camera used in live-action capture is the type described by McLennan and Harish, which is a simple single camera with a fixed focus lens attached to a helmet or other head or body position.

The problem with these designs, aside from their bulk, is that the direction they capture video in is directly related to the direction the wearer is looking for facing. As the person wearing the camera engages in the action, they lean forward and move, or glance about to check what is happening and consequently the camera swings wildly around the scene and produces a video of limited usefulness.

At best, these cameras can be used for short video segments that have been edited to where the camera was actually facing something of interest to the broadcaster. This style of camera is also subject to a great deal of jitter and motion artifacts from all the movement generated by the wearer.

As in the McLennan and Harish solution, some systems address the inherent shaking and bouncing of the video generated by these cameras by horizontally and vertically shifting the image to stabilize it, but they don’t address the key failure of the directional stability of the camera with respect to the intended video capture target.

None of the live-action cameras systems available provide, or prior art teaches of, a solution which incorporates a 3CMOS camera system that is capable of producing professional-quality, high-fidelity, transparent encoded, video that is transferred over a high-capacity wireless network, and that is capable of integrating and merging the video from multiple camera sources, not necessarily inside the same camera housing, in real-time and make that stable and professional quality video available in real-time for live broadcast.

BRIEF SUMMARY OF THE INVENTION

The invention disclosed herein describes technology which can be implemented in a variety of ways. To simplify the language used to describe the technology, this document will not attempt to enumerate all possible ways of making variations that are obvious to those skilled in the art.

As other ways of implementing the disclosed technology are also possible, it should be understood that the material set forth in this disclosure is illustrative only, and should not be treated as a limiting description of such technology.

The present invention provides a live-action camera system that contains at least one camera which transmits live video via a Wi-Fi 6 based Video Mesh Network, consisting of at least one such network node, to at least one computer server implementing an artificial neural network video processing system, where the video feed is prepared and relayed to live TV broadcasts or streamed to viewers locally at the event, or via the Internet.

In accordance with another aspect of the invention, a method providing a miniature live-action camera with one or more of a 3CMOS video sensor for increased video resolution and color fidelity.

In accordance with another aspect of the invention, a method providing a micro live-action camera comprised of one or more sub-miniature cellphone type high-resolution camera sensors.

In accordance with another aspect of the invention, a method providing a golf flagstick camera comprised of two cellphone sized high-resolution camera sensors and two super wide-angle lenses for the capture of a spherical 360-degree view of the golf green. The camera incorporates flagstick movement information, combined with intended camera direction and object tracking, to keep the intended subject in the center of the video frame.

In accordance with another aspect of the invention, a method providing a dual 3CMOS 360-degree spherical capture camera.

In accordance with another aspect of the invention, a method providing a 3 CMOS baseball cap mounted camera with an oversized resolution, camera motion, and object tracking to keep the subject centered in the video frame.

In accordance with another aspect of the invention, a method providing multiple cellphone sized image sensors arranged to provide the source images to generate a 360- degree hemispherical view camera that can be mounted to many surfaces, such as a helmet. In accordance with another aspect of the invention, a method providing multiple 3 CMOS sensor modules and wide-angle lenses to provide the source images to produce a high-fidelity 360-degree hemispherical camera that can be mounted to many surfaces, such as a surfboard.

In accordance with another aspect of the invention, a method providing a cellphone sized high-resolution camera with over-resolution image capture and a wide-angle lens capable of being worn on a player's body, in various positions and mounted by various means. The camera incorporates the player’s movement information to keep the camera output image pointed in the intended direction by sub-sampling the video sensor data to keep the intended target in the center of the video frame.

In accordance with another aspect of the invention, a method providing a plurality of sub-miniature cameras with cellphone sized camera modules, which are arranged in a circle around the perimeter of an MMA ring providing a 360-degree surround video capture capability. The cameras are embedded into the top ring and send their data and receive power through a wired Ethernet connection. The individual camera connections are collected at a network switch where all the video information is relayed to the AT360 image processing server.

In accordance with another aspect of the invention, a method providing a plurality of miniature 3 CMOS cameras are arranged in a circle around a target area, such as a golf tee block. The cameras send their data and receive their power through a wired Ethernet connection. The individual camera connections are collected at a network switch where all the video information is relayed to the AT360 image processing server. In accordance with another aspect of the invention, a method providing a Video Mesh Network for transferring the video and control information through a wireless wide- area network. One such method to achieve this is a Video Mesh Network created from one or more Wi-Fi 6 wireless network nodes which function as access points to the Video Mesh Network as well as network traffic routers and relay points.

The Video Mesh Network nodes utilize a proprietary Video Mesh Network Protocol (VMNP) which provides a mechanism to interleave video transfers from multiple cameras and format the video data to synchronize and move the high volume of video traffic throughout the network, minimizing congestion and video lag. Additionally, the VMNP provides a process for camera configuration and control as well as network configuration.

In accordance with another aspect of the invention, a method providing floating Video Mesh Network nodes that supply the means to route the wireless network traffic both laterally and vertically around obstacles that would obstruct the video transmissions, such as large interfering waves when capturing video on the water.

The floating Video Mesh Network nodes operate in both water and air mediums to provide the necessary coverage.

In accordance with another aspect of the invention, a method for synchronizing the shutters of all connected cameras so that all the cameras capture an image at the same instant in time. This facilitates the joining of multiple images from multiple cameras into a larger composite image of the scene, and the ability to capture multiple images of the event at the same moment in time as well as the ability to create a Time Shot, where video or images are retrieved by referencing this common time element. In accordance with another aspect of the invention, a process for capturing and delivering native resolution still images from the cameras connected to the system.

In accordance with another aspect of the invention, a process for merging together multiple frames of video from multiple cameras, not necessarily residing in the same camera housing, and forming them into a larger composite video frame.

In accordance with another aspect of the invention, a process for correcting the captured images to ensure consistent exposure and color.

In accordance with another aspect of the invention, a process for correcting the optical distortion and vignetting effects of the captured images.

In accordance with another aspect of the invention, a process to incorporate camera motion information, video feature tracking, and image motion heuristics to determine the correct position and orientation of the sub-image to use for the current video frame from the oversized high-resolution raw video data stream.

In accordance with another aspect of the invention, a process for tracking the motion of selected targets in the video data stream.

In accordance with another aspect of the invention, a process for generating multiple, specific views linked to the motion of a selected target or change in orientation of the camera.

In accordance with another aspect of the invention, a method to distribute the generated video to broadcasting services and local event viewers. BRIEF DESCRIPTION OF THE DRAWINGS

The following illustrations may help to clarify the description of the invention. FIG. 1 - Figure 1 depicts an overall example of possible configurations of the system using the various embodiments of the present invention.

FIG. 2 - Figure 2 depicts an example of one possible configuration of a camera (200) with two lenses, each with 3CMOS module and Wi-Fi 6 communications. FIG. 3 - Figure 3 depicts an example of one possible configuration of a camera (202) with four lenses, each with 3CMOS module and Wi-Fi 6 communications. FIG. 4 - Figure 4 depicts an example of one possible configuration of a camera (204) with a single lens with a 3CMOS module that communicates using a wired Ethernet connection.

FIG. 5 - Figure 5 depicts an example of one possible configuration of a dual-lens camera (206), each with a cellphone camera module video sensor and Wi-Fi 6 radio communications and capable of being mounted around a golf flagstick.

FIG. 6a - Figure 6a depicts an example of one possible configuration of a camera (208) with a single lens with a 3CMOS module and Wi-Fi 6 communications which is capable of being mounted to the brim of a cap.

FIG. 6b - Figure 6b depicts an example of one possible configuration of a camera (208) mounted on a baseball cap.

FIG. 7 - Figure 7 depicts an example of one possible configuration of a single-lens camera (210) with a cellphone camera module video sensor and Wi-Fi 6 radio communications, which is capable of being mounted on an athlete's body. FIG. 8 - Figure 8 depicts an example of one possible configuration of a single-lens camera (212) with a cellphone camera module video sensor with wired Ethernet communications, which is capable of being embedded in the safety railing of an MMA fighting ring.

FIG. 9 - Figure 9 depicts an example of one possible configuration of a four-lens camera (214) using four cellphone camera module video sensors and Wi-Fi 6 radio communications, mounted on a helmet.

FIG. 10 - Figure 10 depicts an example of a Video Mesh Network node (100) that provides an access point into the Wi-Fi 6 wireless network.

FIG.11 - Figure 11 depicts an image processing workstation (400) that turns the multiple video streams into output feeds for live broadcasting.

FIG. 12 - Figure 12 depicts an example of a beam-splitting prism module (252) that separates the red, green, and blue bands of light into separate paths and routes these bands of light to individual monochrome CMOS video sensors.

FIG. 13 - Figure 13 depicts a perspective view of an example of the beam-splitting prism (252).

FIG. 14 - Figure 14 depicts a block diagram of camera (200).

FIG. 15 - Figure 15 depicts a block diagram of camera (202).

FIG. 16 - Figure 16 depicts a block diagram of camera (204).

FIG. 17 - Figure 17 depicts a block diagram of camera (206).

FIG. 18 - Figure 18 depicts a block diagram of camera (208).

FIG. 19 - Figure 19 depicts a block diagram of camera (210).

FIG. 20 - Figure 20 depicts a block diagram of camera (212). FIG. 21 - Figure 21 depicts a block diagram of camera (214).

FIG. 22 - Figure 22 depicts a block diagram of a Video Mesh Network routing node and network access point (100).

FIG. 23 - Figure 23 depicts a block diagram of the CNN-based image processing workstation (400).

FIG. 24 - Figure 24 depicts one possible configuration of a Video Mesh Network. FIG. 25 - Figure 25 depicts the relative positions and size of encoder frames.

FIG. 26 - Figure 26 depicts how multiple Wi-Fi 6 clients simultaneously use an access point.

FIG. 27 - Figure 27 depicts how Wi-Fi 6 radio spectrum is allotted and selected in the Video Mesh Network Protocol (VMNP).

FIG. 28 - Figure 28 depicts how Wi-Fi 6 Resource Units are allotted and selected in the Video Mesh Network Protocol (VMNP).

FIG. 29 - Figure 29 depicts the various data elements of the Video Mesh Network Protocol (VMNP).

FIG.30 - Figure 30 depicts how image color purity is affected when cameras use a 3CMOS image sensor or a Bayer pattern image sensor.

FIG. 31 - Figure 31 depicts how image resolution is affected when cameras use a 3CMOS image sensor or a Bayer pattern image sensor.

FIG. 32 - Figure 32 depicts the field of view (FOV) coverage from a 360-degree surround camera system and a reference camera used for training.

FIG. 33 - Figure 33 depicts the training mechanism for training a CNN to stitch images from a surround camera system. FIG. 34a - Figure 34a depicts camera (212) embedded in a safety rail.

FIG. 34b - Figure 34b depicts a cross-section of camera (212) embedded in the safety rail.

FIG. 35 - Figure 35 depicts the field of view coverage for a surround camera system embedded in the safety rails and supports of an MMA fighting ring.

FIG. 36 - Figure 36 depicts the training mechanism for training a CNN to stitch images from a surround camera system where the cameras have parallel and angled alignment with each other, such as when used for an MMA fighting ring.

FIG. 37 - Figure 37 depicts various training images for training a CNN to stitch images from cameras with different orientations relative to each other.

FIG. 38 - Figure 38 depicts one possible image processing workflow through multiple CNNs.

TABLE. 1 - Table 1 provides the selection code and video transmission bitrates for different levels of video transmission quality used in the Video Mesh Network Protocol (VMNP).

DETAILED DESCRIPTION

Before beginning a detailed description of the subject invention, mention of the following is in order. When appropriate, like reference materials and characters are used to designate identical, corresponding, or similar components in differing figure drawings. The figure drawings associated with this disclosure typically are not drawn with dimensional accuracy to scale, i.e., such drawings have been drafted with a focus on clarity of viewing and understanding rather than dimensional accuracy.

In the interest of clarity, not all of the routine features of the implementations described herein are shown and described. It will, of course, be appreciated that in the development of any such actual implementation, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with application- and business-related constraints, and that these specific goals will vary from one implementation to another and from one developer to another.

Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of this disclosure. Applicant's invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting since the scope of the present invention will be limited only by the appended claims.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, the preferred materials are now described.

All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed.

As used herein and in the appended claims, the singular forms “a,” “an 1 , and “the include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely.” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.

As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Referring now to the drawings FIG. 1 depicts several possible embodiments of the present invention in use in a variety of applications.

In one preferred embodiment, camera (200) is utilized to provide 360-degree spherical video coverage from a racecar on a track. Camera (200) uses two 3CMOS camera modules (252) with fisheye lenses to capture a 360-degree spherical image.

Referring now to FIG. 14, the RGB outputs from each 3 CMOS camera module (252) feed into the related FPGA (254) which combines the separate color data together into a new RGB image data stream The audio input from the associated microphone (250) is added to the data stream that is sent to the video encoder (230) which takes the audio and video streams from both 3 CMOS camera modules and turns them into two separate series of h.265 encoded audio/video frames.

The CPU (220) connects to peripherals SD Card (224), inertial measurement unit (IMU) (226), GPS (227), h.265 video encoder (230) and the Wi-Fi 6 radio module (232). The IMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point. The result is a targeting vector that points to where the reference point is relative to the camera's current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest. This targeting vector is read by CPU (220) and passed to the AT360 image processing system (400) by inserting it into the data of the two encoded video streams. GPS (227) tracks the camera’s position and reports this information to the CPU (220) for inclusion in the video stream data along with the targeting vector. The GPS (227) also performs the function of an accurate time source for the real time clock (RTC) inside the CPU (220). This RTC is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame capture.

The CPU (220) reads the frame data generated by the encoder (230) and stores it in temporary buffer structures in RAM (222), for transmitting via the Wi-Fi 6 module (232) when it is its turn to send the data. The video frames accumulate in the video buffer and when the structure has reached one minute of video in the buffer, the one-minute block of video is written to a timestamped file on the SD Card (224) and a new buffer structure is created for the next minute of video frames.

When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). Similarly, when the SD Card (224) reaches a predetermined limit of remaining free space the oldest one-minute file is deleted from the SD-Card (224) to make space for more buffered video frames. These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time. First, the CPU would try to retrieve the requested video information for the RAM (222), and if not present there, from the SD Card (224).

The user interface panel (228) contains an LCD display for relaying information to the user, software-defined buttons for selecting options presented on the display, and an indicator LED which shows the current state of the camera. The states include green to indicate the camera is powered on, yellow to indicate the camera is in preview mode and sending lower resolution video, and red to indicate that the camera is live and delivering high-resolution video. A blinking red LED indicates a fault with the camera.

The power supply (236) uses the energy stored in battery (240) to power the camera. The battery is charged by the power supply (236) from energy transferred by the wireless charging coil (238) which is embedded in the quick disconnect plate at the base of the camera body.

Camera (200) connects to and communicates with the Video Mesh Network using standard Wi-Fi 6 protocols and transmits its video frames using the Video Mesh Network Protocol (180) to the nearest assigned Video Mesh Network node (100). The Video Mesh Network nodes then route the video frame data through the Video Mesh Network comprised of multiple Video Mesh Network nodes (100).

The Video Mesh Network nodes (100) are loaded with routing and radio spectrum configuration information via the Video Mesh Network Protocol (180), which directs each node on where to send the incoming packets and on which resource units (RU) the information is to be transmitted or received during this transmission opportunity (TXOP) time slice.

The video frames are moved through the Video Mesh Network nodes (100) until they reach a root node where they exit the Video Mesh Network and proceed to their final destination address through a wired Ethernet connection (108) to a router (120) which relays the video frame to the AT360 image processing workstation (400). The AG360 image processing system (400) collects the individual frames from multiple cameras and assembles them back into their individual video streams. The frames of these streams are processed through a series of convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and the spatially related frames are merged into a larger composite virtual video frame from which the desired view to be broadcast is selected. Once image processing is completed, the output from the AI-360 image processing workstation (400) is sent to multiple destinations through Ethernet cable (108) via router (120), where it is distributed to local onsite TV broadcast equipment for over the air broadcasts, streamed to local event viewers (500) through the Video Mesh Network nodes (100) using standard Wi-Fi protocols for maximum compatibility with older mobile devices, and transferred via the Internet (140) to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130).

In another preferred embodiment, camera (202) is utilized to provide 360-degree hemispherical video coverage of a surfer. Camera (202) uses four 3CMOS camera modules (252) to capture a 360-degree hemispherical image around the camera.

Referring now to FIG. 21, the RGB outputs from each 3 CMOS camera module (252) feed into the related FPGA (254) which combines the separate color data together into a new RGB image data stream. The audio input from the associated microphone (250) is added to the data stream that is sent to the video encoder (230) which takes the audio and video streams from both 3CMOS camera modules and turns them into four separate series of h.265 encoded audio/video frames.

The CPU (220) connects to peripherals SD Card (224), inertial measurement unit (IMU) (226), GPS (227), h.265 video encoder (230) and the Wi-Fi 6 radio module (232). The IMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point. The result is a targeting vector that points to where the reference point is relative to the camera's current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest. This targeting vector is read by CPU (220) and passed to the AI-360 image processing system (400) by inserting it into the data of the two encoded video streams. GPS (227) tracks the camera’s position and reports this information to the CPU (220) for inclusion in the video stream data along with the targeting vector.

The GPS (227) also performs the function of an accurate time source for the real time clock (RTC) inside the CPU (220). This RTC is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame capture.

The CPU (220) reads the frame data generated by the encoder (230) and stores it in temporary buffer structures in RAM (222), for transmitting via the Wi-Fi 6 module (232) when it is its turn to send the data. The video frames accumulate in the video buffer and when the structure has reached one minute of video in the buffer, the one-minute block of video is written to a timestamped file on the SD Card (224) and a new buffer structure is created for the next minute of video frames.

When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). Similarly, when the SD Card (224) reaches a predetermined limit of remaining free space the oldest one-minute file is deleted from the SD-Card (224) to make space for more buffered video frames. These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time. First, the CPU would try to retrieve the requested video information for the RAM (222), and if not present there, from the SD Card (224).

The user interface panel (228) contains an LCD display for relaying information to the user, software-defined buttons for selecting options presented on the display, and an indicator LED which shows the current state of the camera. The states include green to indicate the camera is powered on, yellow to indicate the camera is in preview mode and sending lower resolution video, and red to indicate that the camera is live and delivering high-resolution video. A blinking red LED indicates a fault with the camera.

The power supply (236) uses the energy stored in battery (240) to power the camera. The battery is charged by the power supply (236) from energy transferred by the wireless charging coil (238) which is embedded in the quick disconnect plate at the base of the camera body.

Camera (202) connects to and communicates with the Video Mesh Network using standard Wi-Fi 6 protocols and transmits its video frames to a floating Video Mesh Network comprised of one or more nodes (101a) or (102a) using the Video Mesh Network Protocol (180). The floating Video Mesh Network nodes (101a) are attached to any manner of water-based floating craft, including JetSkis (101b), small watercraft, or floating platforms. The floating Video Mesh Network nodes (102a) are attached to any manner of air-based floating craft, including miniature blimps (102b), drones, or floating lighter-than- air platforms. These floating nodes capture and relay video information from camera (202) where line of sight transmission from the camera (202) to the shore Video Mesh Network node (100) is blocked by water or other obstructions.

The video information is transferred through the floating nodes (101a) or (102a) until it is delivered to a Video Mesh Network node on land (100) and delivered to the production truck (401).

The AT 360 image processing system (400) collects the individual frames from multiple cameras and assembles them back into their individual video streams.

The frames of these streams are processed through the convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and spatially related frames are merged into a larger composite frame from which the desired view to be broadcast is selected from.

Once processing is completed, the output from the AT 360 image processing workstation (400) is sent to multiple destinations. One live video feed is streamed to local event viewers (500) using standard Wi-Fi protocols for maximum compatibility with older mobile devices. Another live feed is sent to multiple destinations through the satellite uplink (103) where it is relayed by satellite (104) back to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130) and to broadcasting equipment for over the air broadcasts.

In another preferred embodiment, a 360-degree surround capture system is created using multiple cameras (204) arranged in a circle around the target area. The surround capture system takes video from multiple fixed positions around a target area, in this instance a golf tee block, and sends them to the AI-360 image processing workstation (400) where they are then fused together to create a large virtual scene. This scene can have a virtual camera moved around the circle surrounding the target subject and lets the viewers see the subject from multiple camera angles as if a real camera was being moved around the scene.

Referring now to FIG. 16, the RGB outputs from the 3CMOS camera module (252) feed into the related FPGA (254) which combines the separate color data together into a new RGB image data stream. The audio input from the associated microphone (250) is added to the data stream that is sent to CPU (220) which takes the audio and video streams and turns them into a series of h.265 encoded audio/video frames.

The CPU (220) connects to peripherals SD Card (224), inertial measurement unit (IMU) (226), GPS (227), and the Ethernet module (234). The IMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point. The result is a targeting vector that points to where the reference point is relative to the camera's current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest. This targeting vector is read by CPU (220) and passed to the AT360 image processing system (400) by inserting it into the data of the encoded video stream. GPS (227) tracks the camera’s position and reports this information to the CPU (220) for inclusion in the video stream data along with the targeting vector.

The GPS (227) also performs the function of an accurate time source for the real time clock (RTC) inside the CPU (220). This RTC is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame captured.

The CPU (220) takes the generated frame data and stores it in temporary buffer structures in RAM (222) for transmitting via the Ethernet module (234). The video frames accumulate in the video buffer and when the structure has reached one minute of video in length, the one-minute block of video is written to a timestamped file on the SD Card (224) and a new buffer structure is created for the next minute of video frames.

When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). Similarly, when the SD Card (224) reaches a predetermined limit of remaining free space the oldest one-minute file is deleted from the SD-Card (224) to make space for more buffered video frames. These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time. First, the CPU would try to retrieve the requested video segment from the RAM (222), and if not present there, from the SD Card (224).

The user interface panel (229) contains an indicator LED which shows the current state of the camera. The states include green to indicate the camera is powered on, yellow to indicate the camera is in preview mode and sending lower resolution video, and red to indicate that the camera is live and delivering high-resolution video. A blinking red LED indicates a fault with the camera.

The power supply (242) uses Power Over Ethernet (PoE) from the Ethernet module (234) to power the camera. The surround cameras (204) connect to a wired Ethernet network using Ethernet cables (108) and communicate with the AT 360 image processing workstation (400) using the Video Mesh Network Protocol (180). The video data is routed to the AI-360 image processing workstation (400) via router (120). The AT360 image processing system (400) collects the individual frames from the multiple cameras of the surround system and assembles them back into their individual video streams.

The frames of these streams are processed through the convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and spatially related frames are merged into a larger composite frame from which the desired view to be broadcast is selected from.

This virtual frame forms a circle around the target subject and the operator can select a view of the target from a virtual camera that can be moved in a circle around that target providing the viewers with different views of the scene.

Once processing is completed, the output from the AI-360 image processing workstation (400) is sent to multiple destinations through Ethernet cable (108) to router (120), where it is distributed to local onsite TV broadcast equipment for over the air broadcasts, streamed to local event viewers (500) through the Video Mesh Network nodes (100) using standard Wi-Fi protocols for maximum compatibility with older mobile devices, and transferred via the Internet (140) to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130).

In another preferred embodiment, camera (206) is utilized to provide 360-degree spherical video coverage from a golf flagstick. This camera uses a pair of fisheye lenses coupled with a micro-sized high-resolution cellphone camera module (256), to capture a 360-degree spherical image surrounding itself. The camera (206) also captures its own motion using the inertial measurement unit (IMU) (226) which generates a vector to a user- designated target point in the scene. As the camera (206) moves, the IMU calculates a vector that tells the AI· 360 image processing system (400) were in the virtual scene the virtual camera is to be pointed, thereby stabilizing the camera movement as well as the video image and removing the unwanted camera motion.

Referring now to FIG. 17, the video encoder (230) combines the video output from the cellphone camera modules (256) with the audio from the associated microphone (250) and turns them into a series of h.265 encoded audio/video frames.

The CPU (220) connects to peripherals SD Card (224), inertial measurement unit (IMU) (226), GPS (227), h.265 video encoder (230) and the Wi-Fi 6 radio module (232). The EMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point.

The result is a targeting vector that points to where the reference point is relative to the camera's current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest.

This targeting vector is read by CPU (220) and passed to the AT 360 image processing system (400) by inserting it into the data of the two encoded video streams. GPS (227) tracks the camera’s position and reports this information to the CPU (220) for inclusion in the video stream data along with the targeting vector.

The GPS (227) also performs the function of an accurate time source for the real time clock (RTC) inside the CPU (220). This RTC is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame capture.

The CPU (220) reads the frame data generated by the encoder (230) and stores it in temporary buffer structures in RAM (222), for transmitting via the Wi-Fi 6 module (232) when it is its turn to send the data.

The video frames accumulate in the video buffer and when the structure has reached one minute of video in the buffer, the one-minute block of video is written to a timestamped file on the SD Card (224) and a new buffer structure is created for the next minute of video frames.

When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). Similarly, when the SD Card (224) reaches a predetermined limit of remaining free space the oldest one-minute file is deleted from the SD-Card (224) to make space for more buffered video frames.

These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time. First, the CPU would try to retrieve the requested video information for the RAM (222), and if not present there, from the SD Card (224).

The user interface panel (229) contains an indicator LED which shows the current state of the camera. The states include green to indicate the camera is powered on, yellow to indicate the camera is in preview mode and sending lower resolution video, and red to indicate that the camera is live and delivering high-resolution video. A blinking red LED indicates a fault with the camera. The power supply (246) uses stored energy in the field- replaceable battery (249) to power the camera.

Camera (206) connects to and communicates with the Video Mesh Network using standard Wi-Fi 6 protocols and transmits the video frames using the Video Mesh Network Protocol (180) to the nearest assigned Video Mesh Network node (100).

The Video Mesh Network nodes then route the video frame through the Video Mesh Network comprised of multiple Video Mesh Network nodes (100) or by rerouting the video data to a wired Ethernet connection.

Such routing of the video data from the wireless network to a wired network can happen for multiple reasons, including to bypass a transmission obstacle (109) or to send the data a long distance via a fiber optic link (110), eliminating the need for the data to pass through multiple nodes of the wireless network. In the case of bypassing a wireless transmission obstacle, the video data can be reintroduced into the wireless network by connecting the wired Ethernet cable (108) to an available Video Mesh Network node (100).

The Video Mesh Network nodes (100) are loaded with routing and radio spectrum configuration information via the Video Mesh Network Protocol (180), which directs each node on where to send the incoming packets and on which resource units (RU) the information is to be transmitted or received during this transmission opportunity (TXOP) time slice, and which traffic should get rerouted to a wired Ethernet connection at that node.

The video frames are moved through the various networks until they reach their final destination address through a wired Ethernet connection (108) to a router (120) which relays the video frame to the AT360 image processing workstation (400). The AT360 image processing system (400) collects the individual frames from multiple cameras and assembles them back into their individual video streams.

The frames of these streams are processed through the convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and spatially related frames are merged into a larger composite frame from which the desired view to be broadcast is selected from.

Once processing is completed, the output from the AT 360 image processing workstation (400) is sent to multiple destinations through Ethernet cable (108) to router (120), where it is distributed to local onsite TV broadcast equipment for over the air broadcasts, streamed to local event viewers (500) through the Video Mesh Network nodes (100) using standard Wi-Fi protocols for maximum compatibility with older mobile devices, and transferred via the Internet (140) to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130).

In another preferred embodiment, camera (208) is utilized to provide a first-person point of view camera; in this case, mounted to a baseball cap and captured during a baseball game. The camera (208) houses a 3 CMOS camera module (252) and captures its own motion using the inertial measurement unit (IMU) (226) which generates a vector to a user- designated target point in the scene.

As the camera (208) moves, the IMU (226) calculates a vector that tells the AI-360 image processing system (400) were in the virtual scene the virtual camera is to be pointed, thereby stabilizing the camera motion and the video image and removing the unwanted camera motion as the wearer engages in the sporting activity. Referring now to FIG. 18, the RGB outputs from the 3CMOS camera module (252) feed into the related FPGA (254) which combines the separate color data together into a new RGB image data stream. The audio input from the associated microphone (250) is added to the data stream that is sent to the CPU (220) which takes the audio and video streams from the 3CMOS camera module and turns them into a series of h.265 encoded audio/video frames.

The CPU (220) connects to peripherals SD Card (224), inertial measurement unit (IMU) (226), and the Wi-Fi 6 radio module (232). The EMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point.

The result is a targeting vector that points to where the reference point is relative to the camera’ s current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest. This targeting vector is read by CPU (220) and passed to the AT 360 image processing system (400) by inserting it into the data of the encoded video stream.

The real-time clock (RTC) inside the CPU (220) is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame capture.

The CPU (220) takes the encoded video frame data and stores it in temporary buffer structures in RAM (222), for transmitting via the Wi-Fi 6 module (232) when it is its turn to send the data. The video frames accumulate in the video buffer and when the structure has reached one minute of video in the buffer, the one-minute block of video is written to a timestamped file on the SD Card (224) and a new buffer structure is created for the next minute of video frames. When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). Similarly, when the SD Card (224) reaches a predetermined limit of remaining free space the oldest one- minute file is deleted from the SD-Card (224) to make space for more buffered video frames.

These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time. First, the CPU would try to retrieve the requested video information for the RAM (222), and if not present there, from the SD Card (224).

The user interface panel (229) contains an indicator LED which shows the current state of the camera. The states include green to indicate the camera is powered on, yellow to indicate the camera is in preview mode and sending lower resolution video, and red to indicate that the camera is live and delivering high-resolution video. A blinking red LED indicates a fault with the camera. The power supply (243) uses stored energy in battery (249) to power the camera. The battery is charged by the power supply (243) from power supplied by the USB (233).

Camera (208) connects to and communicates with the Video Mesh Network using standard Wi-Fi 6 protocols and transmits its video frames using the Video Mesh Network Protocol (180) to the nearest assigned Video Mesh Network node (100) which are located around the baseball diamond perimeter. The Video Mesh Network nodes then route the video frame through the Video Mesh Network comprised of multiple Video Mesh Network nodes (100).

The Video Mesh Network nodes (100) are loaded with routing and radio spectrum configuration information via the Video Mesh Network Protocol (180), which directs each node on where to send the incoming packets and on which resource units (RU) the information is to be transmitted or received during this transmission opportunity (TXOP) time slice, and which traffic should get rerouted to a wired Ethernet connection at that node.

The video frames are moved through the various networks until they reach their final destination address through a root node with a wired Ethernet connection (108) to a router (120) which relays the video frame to the AT 360 image processing workstation (400). The AT 360 image processing system (400) collects the individual frames from multiple cameras and assembles them back into their individual video streams.

The frames of these streams are processed through the convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and spatially related frames are merged into a larger composite frame from which the desired view to be broadcast is selected from.

Once processing is completed, the output from the AT 360 image processing workstation (400) is sent to multiple destinations through Ethernet cable (108) to router (120), where it is distributed to local onsite TV broadcast equipment for over the air broadcasts, streamed to local event viewers (500) through the Video Mesh Network nodes (100) using standard Wi-Fi protocols for maximum compatibility with older mobile devices, and transferred via the Internet (140) to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130).

In another preferred embodiment, camera (210) is utilized to provide a first-person point of view camera; in this case, mounted to a basketball player’s jersey and capturing video during a basketball game.

The camera (210) houses a micro-sized high-resolution cellphone camera module (256). The camera (210) also captures its own motion using the inertial measurement unit (IMU) (226) which generates a vector to a user-designated target point in the scene.

As the camera (208) moves, the IMU (226) calculates a vector that tells the AT360 image processing system (400) were in the virtual scene the virtual camera is to be pointed, thereby stabilizing the camera motion and the video image and removing the unwanted camera motion as the wearer engages in the sporting activity.

Referring now to FIG. 19, the CPU (220) combines the video output from the cellphone camera module (256) with the audio from the associated microphone (250) and turns them into a series of h.265 encoded audio/video frames.

The CPU (220) connects to peripherals SD Card (224), inertial measurement unit (IMU) (226), and the Wi-Fi 6 radio module (232). The IMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point.

The result is a targeting vector that points to where the reference point is relative to the camera’ s current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest. This targeting vector is read by CPU (220) and passed to the AT 360 image processing system (400) by inserting it into the data of the encoded video stream.

The real-time clock (RTC) inside the CPU (220) is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame capture.

The CPU (220) takes the encoded video frame data and stores it in temporary buffer structures in RAM (222), for transmitting via the Wi-Fi 6 module (232) when it is its turn to send the data. The video frames accumulate in the video buffer and when the structure has reached one minute of video in the buffer, the one-minute block of video is written to a timestamped file on the SD Card (224) and a new buffer structure is created for the next minute of video frames.

When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). Similarly, when the SD Card (224) reaches a predetermined limit of remaining free space the oldest one-minute file is deleted from the SD-Card (224) to make space for more buffered video frames.

These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time. First, the CPU would try to retrieve the requested video information for the RAM (222), and if not present there, from the SD Card (224).

The user interface panel (229) contains an indicator LED which shows the current state of the camera. The states include green to indicate the camera is powered on, yellow to indicate the camera is in preview mode and sending lower resolution video, and red to indicate that the camera is live and delivering high-resolution video.

A blinking red LED indicates a fault with the camera. The power supply (243) uses stored energy in battery (249) to power the camera. The battery is charged by the power supply (243) from power supplied by the USB (233).

Camera (210) connects to and communicates with the Video Mesh Network using standard Wi-Fi 6 protocols and transmits its video frames using the Video Mesh Network Protocol (180) to the nearest assigned Video Mesh Network node (100) which are suspended above the basketball court.

The Video Mesh Network nodes then route the video frames through wired Ethernet cables (108) to a local router (120). The Video Mesh Network nodes (100) are loaded with routing and radio spectrum configuration information via the Video Mesh Network Protocol (180), which directs each node on which resource units (RU) the information is to be transmitted or received during this transmission opportunity (TXOP) time slice, and which traffic should get rerouted to a wired Ethernet connection at that node.

The router (120) relays the video frames to the AT360 image processing workstation (400). The AT 360 image processing system (400) collects the individual frames from multiple cameras and assembles them back into their individual video streams.

The frames of these streams are processed through the convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and spatially related frames are merged into a larger composite frame from which the desired view to be broadcast is selected from. Once processing is completed, the output from the AG360 image processing workstation (400) is sent to multiple destinations through Ethernet cable (108) to router (120), where it is distributed to local onsite TV broadcast equipment for over the air broadcasts, streamed to local event viewers (500) through the Video Mesh Network nodes (100) using standard Wi-Fi protocols for maximum compatibility with older mobile devices, and transferred via the Internet (140) to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130).

In another preferred embodiment, a 360-degree surround capture system is created using multiple cameras (212) arranged around a mixed martial arts ring and embedded into the top safety rail as depicted in FIG. 34a and in cross-section FIG. 34b, and the comer support posts in a similar manner. The field of view of the MMA ring surround capture system is depicted in FIG. 35.

The surround capture system takes video from multiple fixed positions around the target area, which in this instance is a pair of fighters, and sends the images to the AT360 image processing workstation (400) where they are then fused together to create a large virtual scene. This scene can have a virtual camera moved around the circle surrounding the target subjects and lets the viewers see the action from multiple camera angles as if a real camera was being moved around in the scene.

Referring now to FIG. 20, the CPU (220) combines the video output from the high- resolution camera module (256) with the audio from the associated microphone (250) and turns them into a series of h.265 encoded audio/video frames. The CPU (220) connects to peripherals inertial measurement unit (IMU) (226), and the Ethernet module (234). The IMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point.

The result is a targeting vector that points to where the reference point is relative to the camera’ s current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest.

This targeting vector is read by CPU (220) and passed to the AT 360 image processing system (400) by inserting it into the data of the encoded video stream.

The real-time clock (RTC) inside the CPU (220) is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame capture.

The CPU (220) takes the encoded video frame data and stores it in temporary buffer structures in RAM (222), for transmitting via the Ethernet module (234) when it is its turn to send the data.

When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time.

The power supply (242) uses Power Over Ethernet (PoE) from the Ethernet module (234) to power the camera. The surround cameras (212) connect to a wired Ethernet network using Ethernet cables (108) and communicate with the AT 360 image processing workstation (400) using the Video Mesh Network Protocol (180) and routed to the AI-360 image processing workstation (400) via router (120). The AT360 image processing system (400) collects the individual frames from the multiple cameras of the surround system and assembles them into their individual video streams.

The frames of these streams are processed through the convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and spatially related frames are merged into a larger composite frame from which the desired view to be broadcast is selected.

This virtual frame encircles the MMA ring and the system operator can select the view of the fighters from a virtual camera that can be moved in a circle around the ring.

Once processing is completed, the output from the AT 360 image processing workstation (400) is sent to multiple destinations through Ethernet cable (108) to router (120), where it is distributed to local onsite TV broadcast equipment for over the air broadcasts, streamed to local event viewers (500) through the Video Mesh Network nodes (100) using standard Wi-Fi protocols for maximum compatibility with older mobile devices, and transferred via the Internet (140) to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130).

These micro-sized high-resolution cameras embedded into the surrounding ring provide unparalleled points of view of the action without obstructions. The IMU output permits the AI 360 image processing workstation (400) to effectively remove the camera motion as participants bump into the safety rails, producing a steady video of the action.

In another preferred embodiment, camera (214) is utilized to provide 360-degree hemispherical first-person video coverage, mounted on a helmet. Camera (214) uses four micro-sized cellphone camera modules (256) to capture a 360-degree hemispherical image around the camera.

The camera (214) also captures its own motion using the inertial measurement unit (IMU) (226) which generates a vector to a user-designated target point in the scene. As the camera (206) moves, the IMU calculates a vector that tells the AT 360 image processing system (400) were in the virtual scene the virtual camera is to be pointed, thereby stabilizing the camera movement as well as the video image and removing the unwanted camera motion.

Referring now to FIG. 21, the video encoder (230) combines the video output from the cellphone camera modules (256) with the audio from the associated microphone (250) and turns them into a series of h.265 encoded audio/video frames.

The CPU (220) connects to peripherals SD Card (224), inertial measurement unit (IMU) (226), GPS (227), h.265 video encoder (230) and the Wi-Fi 6 radio module (232). The IMU (226) uses its internal X, Y, Z accelerometers and Yaw, Pitch, and Roll gyroscope data to calculate the motion of the camera relative to a set reference point.

The result is a targeting vector that points to where the reference point is relative to the camera’ s current position and is used for stabilizing the camera motion in the video and to assist in keeping the video centered on the subject of interest. This targeting vector is read by CPU (220) and passed to the AT 360 image processing system (400) by inserting it into the data of the two encoded video streams. GPS (227) tracks the camera’s position and reports this information to the CPU (220) for inclusion in the video stream data along with the targeting vector.

The GPS (227) also performs the function of an accurate time source for the real time clock (RTC) inside the CPU (220). This RTC is synchronized with the RTC in all the other cameras using the network time protocol (NTP). The RTC is used to synchronize the triggering of the video sensor shutter so that all cameras capture images at the exact same instant in time, and for producing accurate synchronized timestamps for every video frame capture.

The CPU (220) reads the frame data generated by the encoder (230) and stores it in temporary buffer structures in RAM (222), for transmitting via the Wi-Fi 6 module (232) when it is its turn to send the data.

The video frames accumulate in the video buffer and when the structure has reached one minute of video in the buffer, the one-minute block of video is written to a timestamped file on the SD Card (224) and a new buffer structure is created for the next minute of video frames.

When the RAM (222) reaches a predetermined limit of remaining free space, the oldest one-minute buffer is deleted from the RAM(222). Similarly, when the SD Card (224) reaches a predetermined limit of remaining free space the oldest one-minute file is deleted from the SD-Card (224) to make space for more buffered video frames.

These buffers of video frames are used to provide the data should retransmission be required, or if the video is being recalled from a later period in time. First, the CPU would try to retrieve the requested video information for the RAM (222), and if not present there, from the SD Card (224).

The user interface panel (229) contains an indicator LED which shows the current state of the camera. The states include green to indicate the camera is powered on, yellow to indicate the camera is in preview mode and sending lower resolution video, and red to indicate that the camera is live and delivering high-resolution video. A blinking red LED indicates a fault with the camera.

The power supply (236) uses the energy stored in battery (240) to power the camera. The battery is charged by the power supply (236) from energy transferred by the wireless charging coil (238) which is embedded in the quick disconnect plate at the base of the camera body.

Camera (214) connects to and communicates with the Video Mesh Network using standard Wi-Fi 6 protocols and transmits its video frames using the Video Mesh Network Protocol (180) to the nearest assigned Video Mesh Network node (100) which are suspended above the hockey rink.

The Video Mesh Network nodes then route the video frames through wired Ethernet cables (108) to a local router (120). The Video Mesh Network nodes (100) are loaded with routing and radio spectrum configuration information via the Video Mesh Network Protocol (180), which directs each node on which resource units (RU) the information is to be transmitted or received during this transmission opportunity (TXOP) time slice, and which traffic should get rerouted to a wired Ethernet connection at that node. The router (120) relays the video frames to the AT 360 image processing workstation (400). The AT 360 image processing system (400) collects the individual frames from multiple cameras and assembles them back into their individual video streams.

The frames of these streams are processed through the convolutional neural network (CNN) processors (600) where the images undergo any needed quality adjustments and spatially related frames are merged into a larger composite frame from which the desired view to be broadcast is selected from.

Once processing is completed, the output from the AT 360 image processing workstation (400) is sent to multiple destinations through Ethernet cable (108) to router (120), where it is distributed to local onsite TV broadcast equipment for over the air broadcasts, streamed to local event viewers (500) through the Video Mesh Network nodes (100) using standard Wi-Fi protocols for maximum compatibility with older mobile devices, and transferred via the Internet (140) to a video transcoding and streaming service provider (106) for distribution to remote viewers through the Internet (140) to their remote viewing devices (130).

In another preferred embodiment, a method for creating a global time-synchronized shutter mechanism where the shutter of every active camera on the system takes a picture at the exact same moment in time.

The purpose of such highly synchronized shutters is that multiple cameras can be involved in any video frame that is captured. If the cameras all take their photo at different times, then moving subjects can be in different locations in different images and cause undesirable movement artifacts in the composite stitched-together images. Total synchronization of all camera shutters ensures that all cameras perform their image capture of the same scene at the same time.

In addition to improving the quality of stitched images, these time-synchronized frames can be used to create a Time Shot.

This is a collection of images from all active cameras on the system taken at a specific time or range of times for video sequences.

With this, an operator can create a global snapshot of everything that happened at that moment in time. The Realtime Clock (RTC) in the cameras are all synchronized using the Network Time Protocol (NTP). This means that all the cameras have the same time on their RTC and all video frames have timestamps using this synchronized time.

The video sensors in the cameras are triggered to capture images by the CPU (220). The timing of these triggers is controlled by the RTC in the CPU (220) and by frame timing information from the AT 360 image processing system (400) and delivered using the Video Mesh Network Protocol (180).

All video frame timestamps relate to the same moment in time on all operating cameras, even ones not on the same system. In this way, multiple cameras on multiple sites can all have their Time Shot images associated with each other. Systems operating on sites worldwide can do a Time Shot and freeze everything that was happening around the world at that moment in time.

The Video Mesh Network Protocol (VMNP) resides inside the data payload of standard Ethernet protocols and VMNP packets can slip seamlessly into and out of standard Internet traffic and streams from different locations can be connected together and incorporated in local broadcasts, as well as participate in global Time Shot moments. Since all the video frames are timestamped with this NTP -based time, they can be organized and synchronized by moments in time.

The cameras buffer one-minute segments of video frames in the onboard RAM (222). Along with the buffered video frames is a structure that keeps track of the byte offset to the beginning of each frame. When one minute of video has accumulated, that minute of video is stored on the SD Card (224) and the frame offset information is written to an index file for that video buffer data file.

When remaining available storage reaches a threshold for both the RAM (222) or SD Card (224) the oldest one minute buffer is removed to make space for a new buffer. The purpose of these stored buffers is to provide the ability to retrieve video segments from the camera.

This can happen when the camera is offline for a moment due to a transmission interruption, or to replace a corrupted frame, or in response to a Time Shot request. The video segments are requested using the Video Mesh Network Protocol (VMNP) (180) Request Data element ID OxOC.

The GPS and target vector form part of the video data stream delivered using the Video Mesh Network Protocol data stream element ID 0x04.

In another preferred embodiment, a method for taking a native resolution photograph from the action cameras. The Video Mesh Network Protocol (VMNP) supports the requesting of video data from a camera. Part of that request is a stream ID which represents the camera lens number from 1 to n. A stream ID of 0 indicates that a photograph is requested and the current video frame data is sent uncompressed to the AT360 image processing workstation (400) for processing of the photograph data. In another preferred embodiment, a method for separating the visible spectrum into red, green, and blue bands for processing as individual monochrome images. In patent No. US 3,659,918 to Tan and Einhoven a trichroic prism assembly is described that separates the full-color image into red, green and blue spectrum bands.

While its use is quite standard in the industry, this design uses the principle of total internal reflection in the first prism to simplify the design, but this requires that the prisms be separated by an air gap for this to function, making a precise alignment of the prisms more difficult.

This geometry also has the unfortunate side effect of creating an asymmetrical device, which isn’t a problem in larger cameras, but is more problematic when you are creating highly miniaturized cameras. Having to accommodate the elongated one side causes an unnecessary increase in the camera housing size.

Referring now to FIG. 12 and FIG. 13, a symmetrical trichromatic beam splitting module (252) is presented. Full-spectrum light leaving the lens enters the prism (252) and strikes the blue reflective dichroic filter (256) at a 45-degree angle and the image forming blue rays from 440 nm to 480 nm are directed 90 degrees from the original path. The blue rays then strike the silvered mirror (258) at a 45-degree angle and are reflected 90 degrees toward the blue-ray receiving monochrome CMOS sensor (268).

The green wavelengths from 480 nm to 580 nm and the red wavelengths from 580 nm to 680 nm pass straight through the blue reflective dichroic filter (256) and strike the red reflective dichroic filter (260) at a 45-degree angle and the red image forming rays are reflected 90 degrees from their original path and strike silvered reflecting mirror (262) at 45-degrees where they are reflected 90 degrees to fall on the red receiving monochrome CMOS sensor (264).

The green wavelengths pass through red reflecting dichroic filter (260) and travel straight to the green receiving monochrome CMOS sensor (266). The optical paths followed by the reg, green and blue bands of the spectrum are all equal and cause no change in the arrival of the focused rays at the sensors.

The removal of the air gap permits the entire assembly to be made of one solid block which ensures precise alignment of the optical surfaces and strengthens the assembly.

There are two main reasons for performing this color separation rather than just using a single Bayer pattern CMOS RGB sensor.

Referring now to FIG. 30, an examination of the plots of the spectral curves shows a significant difference in the amount of overlap in the red, green, and blue wavelengths between the 3CMOS and the Bayer pattern sensors. This is because the color separation filters are totally different in design.

The color separating prism uses thin-film optical coatings known as dichroic filters. This type of filter has very sharp cutoffs and can accurately control the bands of wavelengths that are passed.

In contrast, the color filters used in the Bayer pattern sensor chips are by necessity simple plastic film filters with poor cutoff characteristics. The overlap of the color bands causes a loss of purity in color. This is why professional cameras all use 3CMOS optical assemblies.

Referring now to FIG. 31, another shortcoming of Bayer pattern sensors is illustrated. The monochrome sensors used in the 3CMOS module have the same number of pixels that exist in the Bayer pattern sensor. However, in the monochrome sensor, every available pixel captures useful detail information. The three separate monochrome images are then combined together in the FPGA (254) to create a full resolution RGB image.

In contrast, the Bayer pattern sensor needs to capture all three colors with the same number of sensor pixels. To perform this, the sensor uses a pattern of red, green, and blue capture pixels, the number of which corresponds to the sensitivities of the sensor to those wavelengths, and then interpolates the colors for the locations where it doesn’t capture the two other colors. This reduces the detail of the produced images.

In another preferred embodiment, a Video Mesh Network (800) comprised of multiple Video Mesh Nodes (100), is utilized to provide the transportation and control of the video data coming from the various cameras utilized in the system.

One such possible configuration of the Video Mesh Network (800) is illustrated in FIG. 24. Here, the cameras (200) wirelessly communicate with the Video Mesh Network nodes (100) where the data is routed from node to node using the standard Open Shortest Path First (OSPF) protocol.

The frames travel through the network until exiting at either of the root nodes (100a) or (100b). From there the data travels through the Ethernet cables (108) to the router (120) and on to the video processing workstation (400) which generates the stream to be broadcast.

The output stream then travels back to router (120) along the Ethernet cable (108) and from there is send out to any destination on the network or over the Internet to viewers.

The Video Mesh Network (800) is based on Wi-Fi 6 technology which provides the base platform for delivering multiple simultaneous streams of high-speed data. Unlike previous generations of Wi-Fi, this 6 th generation was designed specifically to make possible high-capacity wireless networks capable of transferring large volumes of video data.

The theoretical maximum capacity of a Wi-Fi 6 network is around 10 Gbps, which is many times the capacity of previous generations of Wi-Fi. Despite this native capacity, there are some additionals steps that are needed to make this work with multiple cameras all sending high-bitrate data to a video processing server.

Unlike previous wireless strategies, Wi-Fi 6 uses lower power for shorter hops from device to device. One benefit of this is less cross-traffic interference between nodes in the network. With longer range transmissions, there are more opportunities for other cells to interfere with each other. In Wi-Fi 6, there is more opportunity to have multiple cells transmitting simultaneously.

Additionally, instead of the radio transmissions being hindered by multipath reflections that normally degrade the signal, Wi-Fi 6 makes use of multipath effects to transmit more than one signal at the same time.

A sounding frame is first transmitted and captured by the destination node. Analysis of the arriving radio information determines how the multiple paths are affecting this transmission and that information is used to create a data transmission that sends data for different clients over the multiple paths to the receiving station, thus increasing the amount of data that can be sent at once.

Previous generations of Wi-Fi also use the entire channel bandwidth for every transmission, even when all the bandwidth isn’t required. Referring to FIG. 26, Wi-Fi 6 subdivides radio channels into smaller resource units that can be assigned to different clients, and multiple clients can send and receive simultaneously during the same transmission opportunity (TXOP) slice, further increasing the volume of data that can be moved through the network.

Referring not to FIG. 28, to do this, Wi-Fi 6 uses a mechanism known as Multi- User Orthogonal Frequency Division Multiple Access (MU-OFDMA) to subdivide the radio spectrum into smaller frequency allocations, called resource units (RUs) which permit the network node to synchronize communications (uplink and downlink) with multiple individual clients assigned to specific RUs. This simultaneous transmission cuts down on excessive overhead at the medium access control (MAC) sublayer, as well as medium contention overhead.

Using RUs, the network node can allocate larger portions of the radio spectrum to a single user or partition the same spectrum to serve multiple clients simultaneously. This leads to better radio spectrum use and increases the efficiency of the network.

The use of RUs alone isn’t enough to manage the high volume of traffic in a video capture network and the Wi-Fi 6 radio module can use some additional guidance on how to divide up the spectrum to more efficiently transfer all the video data. This is accomplished using the Video Mesh Network Protocol (VMNP) (180) and the Group element.

Referring now to FIG. 27, the VMNP group element lets the system dedicate portions of the radio spectrum to a data group that is used for a specific purpose. This locks a portion of the radio spectrum so that it is always used by this group and isn’t part of the general spectrum that the Wi-Fi 6 divides up on a TXOP basis to handle the random traffic that comes through the network. The groups let you dedicate data pipes to specific tasks, such as a data backbone to transmit the aggregated video traffic from multiple clients accumulating from various network nodes into a wider bandwidth path that is always available. The radio spectrum to be designated to the group is specified by turning on specific bits in the VMNP group element. Bits 0 - 36 control the radio spectrum allocation. Bits 37 - 53 control the RU allocation at the 20 Mhz channel level as illustrated in FIG. 28.

The division of radio spectrum, the subdivision of spectrum into resource units, and the control of spectrum and RUs into groups for data pipe creation provides the needed control to improve the efficiency of the data transmission through the network.

What is still missing is removing some of the randomnesses of the data load and collisions from multiple clients trying to send large data payloads through the network during the same TXOP.

This is managed by the VMNP and the interleaving of video data transmissions. Referring now to FIG. 25, when video frames are encoded for transmission, they generate a series of compressed frames of different types and sizes. There are Intra Frames which occur at regular intervals, such as once every 60 frames for a video produced at 60 fps, and there are Inter Frames between these Intra Frames, which contain highly compressed data that only encodes what has changed from previous frames.

The Intra Frames don’t utilize any predictive compression and as such are considerably larger in size than the Inter Frames. These Intra Frames, or I-frames as they are also known, need to be managed to smooth out the data flowing through the network. This is accomplished by assigning to each camera which frame in the one-second cycle it should start its I-frame on using the VMNP Video Config element ED 0x03 setting field Slice # to the desired timeslice in the one-second cycle.

For instance, if there were ten cameras on the system shooting at 60 fps, then each camera would start its I-frame 6 frames apart from the other cameras. This video interleaving is part of the VMNP and prevents too many clients from trying to send large payloads over the network at random and in competition with each other. The smaller Inter Frames are much easier to manage once the larger I-frames have been organized into a predictable traffic pattern on the network.

For the larger data groups that are transmitting bulk data from node to node, such as I-frame traffic, W-Fi 6 has another speed enhancement feature. This is Multi-User Multi- Input Multi-Output (MU-MIMO), which is the ability to send multiple data streams on multiple antennas, on the same frequency and at the same time. This is effective for large data pipes and sending large blocks of data from one network node to another. The Video Mesh Network uses OFDMA to handle data transfer from all the clients to the Video Mesh Network Nodes, and then MEMO for transmission of large blocks of data between the network nodes as it is routed through the network.

The Wi-Fi 6 radios make use of beamforming when using multiple antennas with MEMO. This is where the signal to the antennas are phase controlled in such a way that the output of the antenna array can be focused in a particular direction toward a target receiver. This increases the range of the transmission and minimizes the interference with other radio units that are talking on other portions of the network at the same time.

Referring now to FIG. 10, the Video Mesh Network Node (100) has three antenna groups. There is a 5.8 GHz eight antenna array (172) which permits eight different beamformed transmissions at the same time on the same frequency which is referred to as 8x8x8 MIMO. A 2.4 GHz four antenna array (174) which permits four different beamformed transmissions at the same time on the same frequency which is referred to as 4x4x4 MIMO, and a GPS receiver antenna (168).

Referring now to FIG. 22, the Video Mesh Network Node (100), also has an Ethernet port (176) which permits the transfer of data to and from a wired Ethernet connection. In this way, the network traffic can be shifted to a standard Ethernet network as needed, either to get around obstructions to the Wi-Fi transmissions or to shift the traffic to a wider area physical network.

The physical network traffic enters or leaves on Ethernet module (176) and is transferred by CPU (162) to RAM (164) and the Wi-Fi 6 module (170) sends and receives wireless network traffic in exchange with the CPU (162). The CPU (162) handles all the Wi-Fi 6 protocols as well as the OSPF routing protocol. It also processes requests from the Video Mesh Network Protocol (180) for statistics, data group control, and routing preferences for groups.

The Wi-Fi 6 radios adjust their power to minimize interference and maximize throughput. Local clients talking to a network node use OFDMA and low power to maximize the number of simultaneous users on the node. Then switch to MIMO and beamforming at higher power to send the data from network node to network node for maximum data transfer rates.

The combination of power control, spectrum subdivision, and simultaneous directional data transfers on the same frequencies enable the network to transfer significant amounts of information at the same time all over the wireless mesh network. In addition to the physical capabilities of the Wi-Fi 6 radio units, the organization provided by the interleaved video transmission and organization of the video transmission paths using the Video Mesh Network Protocol (VMNP) (180) optimizes the network for maximum capacity and efficiency.

To help determine how to organize the data, the VMNP (180) collects statistics from the Video Mesh Network Nodes (100) and the AI· 360 image processing system uses this information along with specifics about camera resolutions and bitrates to determine the I-frame timing and group assignments to be used by the cameras and network.

The CPU (162) communicates with GPS (168) for both location information and for time synchronization. The Video Mesh Network Nodes (100) act as a clock source for the Network Time Protocol (NTP) which is used to synchronize the RTC and shutters in cameras that don’t have their own GPS clock source.

In another preferred embodiment, the Video Mesh Network Protocol (VMNP) (180) is used to manage the data flowing through the Video Mesh Network (800).

The VMNP follows that standard layout of an OSI layer and consists of a Protocol Data Unit (PDU) and a Service Data Unit (SDU).

The VMNP PDU encapsulates the SDU, consisting of a variable payload length of 42 to 1946 Octets. This payload data is filled with one or more VMNP elements that identify different types of actions and their associated data formats. The last element in the payload area is always VMNP element ID 0x00, or the END element.

Referring now to FIG. 29, the VMNP (180) fits into the standard OSI networking model in the following manner. The Physical layer PDU payload encapsulates the MAC layer PDU. The MAC SDU payload encapsulates the VMNP PDU. The VMNP SDU encapsulates the various VMNP elements in the SDU payload.

The VMNP (180) controls all the features of and cameras connected to the Video Mesh Network as well as configuring the resources of the Video Mesh Network Nodes (100). The VMNP (180) consists of twelve unique elements each with a corresponding one octet ID field.

Still referring to FIG. 29, the various VMNP (180) elements perform the following functions.

0x00 signifies the end of a collection of VMNP (180) elements and causes the cessation of parsing the payload data any further for additional elements.

0x01 (Camera Config) sets the specified camera configuration parameters.

0x02 (Camera Mode) sets the current mode of operation of the camera.

Modes include:

0x00 - sleep, where the camera is in a low-power standby mode where it still listens for the Wi-Fi for instructions.

0x01 - for powered-up, but not sending video.

0x02 - sending video in low-resolution preview mode.

0x03 - sending live video in high-resolution mode.

0x03 (Video Config) sets the video operation including video encoder settings, resolutions, and bit rates.

0x04 (Data Stream) a block of data from a video or photograph.

0x05 (Camera Config) returns the current camera configuration.

0x06 (Camera Mode) Returns the current camera mode of operation. 0x07 (Video Config) Returns the current video encoder configuration.

0x08 (Assign Group) Assigns this camera to one of the data pipe groups.

0x09 (Group Config) Configures the network node to use specific spectrum and resource units for all traffic assigned to this group ID.

OxOA (Req Stats) Requests statistics from a camera.

OxOB (Statistics) Requests statistics from a network node.

OxOC (Req Data) Requests data from a camera. Stream ID 0 indicates to take a photograph in native resolution and return the data. A stream ID equal to a lens number plus a time range retrieves video for that lens from that time range.

The individual camera resolutions and bitrates are adjustable via the VMNP for various image sizes and quality using the Video Config element ID 0x03. Each camera has a high-resolution code, field HR CODE for live video and a low-resolution code, field LR CODE for preview video. Table 1 lists the various resolution codes and their associated resolutions, bitrates and encoding levels. The highest possible image quality is produced with settings for what is known as transparent encoded video. This is a video with little or no compression artifacts and produces large amounts of data which is customarily used for a master recording that is suitable for editing without degrading the video quality.

The VMNP (180) provides a compact structure to embed a variety of data into standard Ethernet protocols for easy transport over any network, and the interleaved video and Wi-Fi 6 spectrum controls provide the needed organizational additions to make large- scale video capture in a distributed wireless environment possible. In another preferred embodiment, the images are processed using the AI-360 image processing workstation (400). The AI-360 processor uses a Convolutional Neural Network (CNN) based approach to standard image processing techniques.

Referring now to FIG. 23, the Convolutional Neural Networks (CNN) (600) run on an array of CNN hardware accelerators(601). These CNNs are a type of artificial neural network that is used for image recognition and processing and employ deep learning to create mappings of input images to output images that have some form of image processing performed on them.

In the case of stitching images together, the output of the CNN is a homography matrix that transforms the images to their correct location and shape in the output mosaic of images. For optical distortion correction, the CNN maps a distorted input image to an undistorted output image. Similarly, CNNs are used to align an image with the horizon, turn a low-resolution image into a higher resolution image and correct color and exposure among other treatments.

The CPU (401) handles exchanging images with the CNNs (600) as well as directing the entire chain of processing of the inbound data streams into processed video output streams. The CPU (401) receives video frame data packets from all the cameras via Ethernet module (405). This data is organized and sorted back into the individual data streams for processing. Before the images can be processed further, the encoded frames are decoded by the video decoder (406) and stored in RAM (403). Once all the frames for the current video slice are ready, they are sent into the image processor (600). The CNNs can be arranged in many different ways and have different processing abilities. Referring now to FIG. 38, one possible workflow arrangement is the current frames are input into the image processing engine (901) and a determination is made if this is the first frame of a new video sequence or a continuation of an existing sequence (902).

For the first frame of a new sequence, it is necessary to determine how many cameras are involved in generating this sequence (903). It is also necessary to know which of the lenses are spatially related to each other (904), either at fixed known distances, like a camera with multiple lenses, or related but capable of moving relative to each other, such as with multiple independent moving cameras.

Once all the images and their relationships to each other are known, they are projected into a virtual scene, with their pixels projected to the locations where they exist in physical space relative to each other (905).

Camera motion, alignment and perspective distortions are taken into account and the alignment of the images to a common horizon is performed (906). From here, how to adjust (warp) the adjacent images so that they align with each other is determined (907) and the images are transformed into their new shapes to fit into the virtual scene together (908).

During the warping process, images can become distorted to make them fit together, this next CNN takes the distorted images and corrects them to minimize these distortions and produce a higher quality image (909).

With a stitched -together mosaic of images into a larger virtual scene, the next CNN corrects imbalances of color or exposure in different areas of the virtual scene to make a more unified composited image (910). With a large virtual image to work with, it is determined if there are any objects being tracked (912). If there are, they are separated into the foreground (tracked objects) and background (the rest of the image) to facilitate tracking (913). The objects are then tracked through multiple frames of video (914).

The targeting information for any tracked objects is available along with any manual target points selected by the operator, and the CNN now composes final image frames out of the larger virtual scene using standard image composition rules (915).

The output frames are read by the CPU (401) where they are sent to the video card (404) for display on the workstation screen. The CPU also sends the frames to the video encoder (407) where they are compressed and returned to the CPU for distribution or forwarded to TV broadcasting equipment via the 3G SDI interface (408).

If this was the last frame of the video (917), the process is completed (918) or continued with the input of the next frame of video (901). These subsequent frames already have knowledge of camera relationships and use a CNN to predict where all the images and their features will exist in the new virtual scene (918).

The ability of the AT 360 image processor to generate virtual camera views and track objects creates the ability to produce Video Threads. These are sequences of video created by putting together views from multiple cameras as they track a subject moving around them. A group of hockey players wearing camera (214), would produce a collection of 360-degree videos of a scene from all sorts of different camera angles. The AI-360 image processor can track a subject of interest, a hockey player, as that player moves between the other players on the ice. The 360-degree camera feeds from the players permits the AT360 image processor to create virtual views, following image composition guidelines, and build video sequences that follow the player’s movements as though there was a camera on the ice following the action. This process can be directed by an operator to produce a replay highlight video segment.

All the CNNs need to be trained to perform their functions. This is referred to as deep learning and essentially consists of presenting the neural networks with input images and reference output images. The CNN systematically applies various solution attempts to try and turn the input solution into the output solution. Over time, a collection of solutions presents themselves for various input conditions and the images can be processed quite rapidly. What can take multiple seconds to process using algorithms and CPU time, a CNN can accomplish in milliseconds.

Some image processing tasks can use simpler training setups where a less than optimal input image is compared to an optimal output image and the data set created for the training is straight forward to produce.

Other situations such as stitching images together from cameras with different orientations relative to each other or stitching images for a point of view between two cameras with different points of view require more complicated training.

One such challenge is a surround camera system where a circle of cameras captures a target area from multiple angles, and the AT 360 image processor (600) has to create a seamless stitched view from any angle. Referring now to FIG. 33, a training mechanism for a surround camera system that is suitable for generating the large training data sets needed is illustrated.

Platform (950) has a pair of manikin targets, (951) and (952), possed on it. The platform (950) rotates at one revolution per minute. There are two fixed cameras (960) and (970) which are located at the normal positions of cameras at one evenly spaced segment of the circle surrounding the subject area. A reference camera (962) is moved in one-degree steps from position (960) to (970) with each step happening upon the completion of each one revolution of the target platform.

The fixed cameras and the reference camera are capturing video at 60 fps while the platform (950) is rotating and the reference camera (962) is stepping from fixed position (960) to fixed position (970).

The sweep captures all the relative one-degree positions between the two fixed camera locations, and since the camera locations are evenly spaced around the circle all such segments have the same relationship to each other. This way, the reference camera (962) only needs to capture the relative positions for one slice of the circle.

With the 60 fps capture rate and the reference camera (962) only moving one step per complete revolution of the platform (950), the one sweep of camera (962) will capture all possible angles of the targets from all possible positions. This produces a large sample data set of reference images to train the stitching attempts from various virtual camera positions between the start and endpoints.

Key to this training mechanism is the two manikins (951, 952) rotating in the scene while the reference images are captured. Because their position relative to each other changes as they rotate, the reference camera (962) captures many images with different parts of each manikin blocking the other, as would be the case with objects in a real scene.

A typical failing of camera systems that look to stitch images together is the parallax error created between the two cameras with objects that are close to the cameras. Distant areas of the scene stitch together with little distortion, but objects that are close to the cameras produce widely differing images that are problematic to stitch together.

Some camera systems just try to blend the images together at the seams, which produces ghosts and other visual artifacts in the image. Others will select one image or the other and use all the content from the selected image, producing a resulting image with varying degrees of success at stitching the images.

The AI· 360 solution uses CNNs which are quite adept at identifying which pixels are closer to the camera and which are farther away. This is accomplished by analyzing the relative pixel motion between the two images. Armed with the knowledge of which pixels are closer to the camera, and a reference image of what the scene actually looks like from the desired virtual position, the CNN can learn how to incorporate the potions of each image to produce a stitched image of how the scene would look at that virtual location.

A variation of this problem is presented by the surround system utilized in the MMA fighting ring system. Here there is a combination of cameras pointing at the center of the surround coverage circle, and cameras oriented at 90 degrees to the safety rails they are embedded in.

Referring now to FIG. 36, it is seen that some stitching operations are on adjacent parallel cameras (982, 984, 986), and some are between cameras at an angle to each other (980, 982) or (986, 988).

Using a mechanism similar to the one used in FIG. 33, and moving the reference camera (990) between the fixed position cameras (980) and (988), the reference camera (990) will capture reference images from all the positions that include both parallel cameras and cameras that are aligned at angles with each other. This data set trains the CNN on how to handle creating a virtual image from various positions around the ring and handling the mixture of aligned and angled cameras.

The AT 360 image processor can also stitch together images from cameras that have no fixed relationships or constraints on their orientation. FIG. 37 depicts a number of sample images taken by cameras with different combinations of alignment relative to each other, and a reference image of how the scene appears from a real camera at that position. Using mechanisms similar to FIG. 33 and FIG. 36, but with the fixed position cameras positioned with various angles relative to each other, a data set is produced to train the CNN to generate stitching solutions to a wider range of relative camera positions and angles.