LIU CHENG-YI (US)
SONY CORP AMERICA (US)
US20190378423A1 | 2019-12-12 | |||
EP3614341A1 | 2020-02-26 | |||
US11095870B1 | 2021-08-17 | |||
US20190369613A1 | 2019-12-05 | |||
US202016917013A | 2020-06-30 | |||
US202016917671A | 2020-06-30 | |||
USPP62782862P | ||||
USPP63003097P |
C L A I M S What is claimed is: 1. A method comprising: obtaining a 3D model of a subject generated using a multi-view capturing system; capturing motion of the subject while the subject is moving with a plurality of drones; estimating pose parameters of the subject using the captured motion from the plurality of drones; and applying the pose parameters to animate the 3D model. 2. The method of claim 1 further comprising positioning the subject within views of the plurality of drones by implementing 3D positioning directly. 3. The method of claim 1 further comprising positioning the subject within views of the plurality of drones by implementing 3D positioning indirectly from 2D. 4. The method of claim 1 further comprising using prediction to predict a future location of the subject to determine where to position the plurality of drones. 5. The method of claim 1 further comprising collecting the captured motion from the plurality of drones at a ground control station. 6. The method of claim 5 wherein the ground control station receives videos, positions, and timestamps from the plurality of drones, and sends any controlling or correction commands to the plurality of drones. 7. The method of claim 1 further comprising controlling drone formation of the plurality of drones with a ground control station. 8. The method of claim 1 further comprising controlling drone formation of the plurality of drones with a tracking drone of the plurality of drones. 9. The method of claim 1 wherein each camera of each drone of the plurality of drones is configured to broadcast absolute positions to all other cameras. 10. An apparatus comprising: a non-transitory memory for storing an application, the application for: obtaining a 3D model of a subject; receiving captured motion of the subject while the subject is moving from a plurality of drones; estimating pose parameters of the subject using the captured motion from the plurality of drones; and applying the pose parameters to animate the 3D model; and a processor coupled to the memory, the processor configured for processing the application. 11. The apparatus of claim 10 wherein the application is further configured for positioning the subject within views of the plurality of drones by implementing 3D positioning directly. 12. The apparatus of claim 10 wherein the application is further configured for positioning the subject within views of the plurality of drones by implementing 3D positioning indirectly from 2D. 13. The apparatus of claim 10 wherein the application is further configured for using prediction to predict a future location of the subject to determine where to position the plurality of drones. 14. The apparatus of claim 10 wherein the apparatus receives videos, positions, and timestamps from the plurality of drones, and sends any controlling or correction commands to the plurality of drones. 15. The apparatus of claim 10 wherein the application is further configured for controlling drone formation of the plurality of drones. 16. A system comprising: a plurality of drones configured for capturing motion of a subject while the subject is moving; and a ground control station configured for: obtaining a 3D model of the subject generated using a multi-view capturing system; estimating pose parameters of the subject using the captured motion from the plurality of drones; and applying the pose parameters to animate the 3D model. 17. The system of claim 16 wherein each drone of the plurality of drones is equipped with at least one RGB camera device, wherein a camera’s orientation is controllable by a gimbal attached to each drone of the plurality of drones. 18. The system of claim 16 wherein the plurality of drones are configured for positioning the subject within views of the plurality of drones by implementing 3D positioning directly. 19. The system of claim 16 wherein the plurality of drones are configured for positioning the subject within views of the plurality of drones by implementing 3D positioning indirectly from 2D. 20. The system of claim 16 wherein the plurality of drones are configured for using prediction to predict a future location of the subject to determine where to position the plurality of drones. 21. The system of claim 16 wherein the ground control station is configured for collecting the visual data from the plurality of drones. 22. The system of claim 16 wherein the ground control station is configured to receive videos, positions, and timestamps from the plurality of drones, and send any controlling or correction commands to the plurality of drones. 23. The system of claim 16 wherein the ground control station is configured to control drone formation of the plurality of drones. 24. The system of claim 16 wherein a tracking drone of the plurality of drones is configured for controlling drone formation of the plurality of drones. 25. The system of claim 16 wherein each camera of each drone of the plurality of drones is configured to broadcast absolute positions to all other cameras. |
A side camera is positioned to avoid exact opposite (180E) viewing directions on the X-Y plane. Figure 7 illustrate exemplary diagrams of multiple drones positioned at varying angles according to some embodiments. Depending on the number of drones in use, the angles between the drones are modified. In some embodiments, the swarm formation performs translation in the global coordinates. The second camera s fixed to (v x , v y , ä), where ä is the predefined height difference between the camera and the subject’s head such that the captured image by the camera is able to cover the complete body of the subject. In some embodiments, head guidance is used to specify the second camer Any face pose estimation methods are able to be used to identify the orientation of the face in 3D. The methods are able to be 2D face detection with eye and nose positioning on each drone camera, then the eyes and nose positions are triangulated in 3D by the multiple drones. The orientation of the face in 3D is able to be estimated by the face feature points. The methods are also able to use direct 3D face pose estimation from each single 2D face image by some 2D to 3D face pose inference model, and then the pose is optimized among all drones’ pose estimates in the global coordinates. After the face pose is estimated, the unit vector of face orientation in the global coordinate is defined where d is defined herein. The pose-capturing formation described herein involves minimal communication between the drones or the GCS, because the broadcasted is sufficient. For obtaining the 3D tracking with subject position prediction is able to be done on the top drone. The top drone is able to serve as a specific tracking drone since its camera has the aerial view (e.g., by flying 10 meters above the subject), which is more accurate for tracking the subject on the X-Y (ground) plane than the side drone cameras. The top drone is also able to keep the subject accurately tracked in GPS-denied environments. In some embodiments, for simplicity, it is assumed the ground plane is parallel to the global coordinate’s X-Y plane. If the ground is slanted, all of the side cameras’ X-Y positions are able to be tilted by a slant angle about the Z- axis, centered at for flight safety. The drone formation emphasizes the effective viewpoints for 3D body part triangulation other than the coverage of a subject’s volume. Each drone is fine tuned. As described in the U.S. Patent Serial No.16/917,671, filed June 30, 2020, titled, “METHOD OF MULTI-DRONE CAMERA CONTROL,” fine tuning the drone (or more specifically the camera on the drone) is implemented. At time t, the location of the camera is:
and the true location of the subject is p target (t). A 3D vector in the global coordinates from the camera to the target is and The control policy of the set of gimbal parameters, Gim, is set at time t to le approach that is: Figure 8 illustrates a diagram of camera fine tuning according to some embodiments. By a control policy, the camera is moved by the gimbal to always keep the target in the image center (or at least attempt to). The implementation of gimbal control policy is mechanics- dependent for each gimbal. Figure 9 illustrates a diagram of implementing a gimbal control system according to some embodiments. The gimbal control system centers the target in the image. Subject pose parameter estimation is implemented which uses offline subject 3D modeling. Offline subject 3D modeling is used to obtain the 3D model of the subject and other individual traits such as the biomechanical constraints. Subject pose parameter estimation involves 2D part positions and then 3D subject pose estimation. 2D part position determination is implemented on the drones and/or on the GCS. 3D subject pose estimation is performed on the GCS. 2D part positions are able to be used to perform or assist in performing 3D subject pose estimation. The part positions (e.g., joints, eyes, hands) in each camera image are determined. If the task is performed on the drones, then each drone sends the 2D part positions and the latest timestamps it received from all other drones back to the GCS. If the task is performed on the GCS, each drone sends each frame and the timestamps received from all other drones at this frame back to the GCS. In either implementation, the camera positions are sent back to the GCS. 3D subject pose estimation includes optimization with the following data: skeleton lengths, camera positions, 2D part positions in each camera image, and 2D/3D spatio-temporal smoothness constraint. For subject pose parameter estimation, offline subject model building is implemented. To achieve higher accuracy, the subject’s body parameters are measured before the capturing. The body parameters could include the length of skeleton between joints and biokinetic extrema of each joint the subject is able to perform. In general, 20-40 major body joints are used to control an avatar, with more measurements being better. The surface model of the subject is also able to be modeled by the subject’s texture and shape. Given the surface model, it is possible to calculate or generate detailed expressions or actions for the avatar. For subject pose parameter estimation, an exemplary implementation is described in U.S. Patent Application Ser. No.62/782,862, titled, “PHOTO-VIDEO BASED SPATIAL- TEMPORAL VOLUMETRIC CAPTURE SYSTEM FOR DYNAMIC 4D HUMAN FACE AND BODY DIGITIZATION,” which is hereby incorporated by reference in its entirety for all purposes. The exemplary implementation is about a system to build a human skeleton and surface model and then capture the spatio-temporal changes. Subject pose parameter estimation in each 2D image is able to be performed using CNN- based, multi-subject methods. The estimation is able to be combined with prediction models for better tracking. If the computations are done on the GCS, the 2D image by each camera is sent to the GCS. Embodiments of the pose parameters are able to be for sparse parts or dense parts, compatible with an avatar model. For sparse parts, one is able to use methods such as OpenPose or Mask R-CNN to position each major joint of each subject in an image in real-time. If the computations are done on a drone, then the 2D joint positions are sent to the GCS. For dense parts, an example is DensePose, which positions the elastic part surface in an image. If the computation is performed on a drone, all of the output images or part parameters are sent to the GCS. In DensePose, these are the patch, U, and V images. For subject pose parameter estimation, the input includes: subject model parameter set M (e.g., the lengths of different body parts of the subject, the subject surface models, or the individual part’s biokinetic motion limitations), the current time t, the total number of cameras N, the intrinsic parameters of camera c as K c , camera c’s pose in the global coordinates as 2D part position of the subject in camera c’s image coordinate with timestamp t c , where j stands for the part index, and the history of for all t c < t. The output of subject pose parameter estimation includes a 3D subject’s pose parameter set A(t) at time t, for a sparse part-controlled avatar, an exemplar A(t) is the 6 degrees of freedom 3D position and rotation of each joint, and for a dense part-controlled avatar, an exemplar A(t) is the 3D vertex positions of the body surface mesh model. In some instances, the view of the subject is obstructed or occluded, and a 3D pose is not able to be generated from the acquired data. A time-constrained 3D pose estimation using multiple drones is used for such conditions. The estimation sets a subject’s 3D pose to be estimated at GCS at a period q with tolerance ∈, and the minimum number of cameras for triangulation is m, where m ≥2. At an estimation time t, an inlier camera C is set to C = i. For each drone camera c, if the latest received has t - t c < 0, then C = C ∪ c. After all cameras are checked, if |C| < m, extrapolation is performed for the 3D parameter set at time t as A*(t) and output A*(t) for avatar control. Otherwise, if |C| ≥ m, for each drone camera c ∈ C, if t c < t, extrapolation is performed for each 2D parameter at time t, which forms the estimated of any part j. 2D to 3D estimation use The objective is to find the optimal A*(t) minimizing a loss function L: More specifically, L is defined as: L(A(t),M,C,K,P(t)) = w geo E geo (A(t),C,K,P(t)) + w kin E kin (A(t),M) + w sm E sm (A(t),A his ), where E geo , E kin , and E sm are the energy functions of 3D to 2D part reprojection error, deviations according to the biokinetic statistics, and the 3D temporal smoothness of part trajectories. w geo , w kin , and w sm are the corresponding weights of the energy terms. A his is the history of A earlier than t. For data output, A*(t) is output for avatar control. A*(t) is added to A his , and the oldest A(t) is removed from A his if it is not used by the future extrapolation. A*(t) and t are broadcast to all drones so each drone is able to use this information in the prediction model for 2D part positioning. A*(t) is able to be solved by optimization methods, which are usually iterative, such as gradient descent, Gauss-Newton, or a variant of quasi-Newton methods such as L-BFGS. A*(t) may also be solved by an end-to-end DNN-based method by training a regression model with the output head containing A*(t). At inference, the computation is able to be done in one cycle without iterations. Examples of implementations are able to be found in U.S. Patent Application Ser. No. 63/003,097, titled ML-BASED NATURAL HUMAN FACE/BODY ANIMATION USING VOLUMETRIC CAPTURE SYSTEM + MESH TRACKING, which is hereby incorporated by reference in its entirety for all purposes. Figure 10 illustrates a block diagram of an exemplary computing device configured to implement the drone-based 3D motion reconstruction method according to some embodiments. The computing device 1000 is able to be used to acquire, store, compute, process, communicate and/or display information such as images and videos including 3D content. The computing device 1000 is able to implement any of the encoding/decoding aspects. In general, a hardware structure suitable for implementing the computing device 1000 includes a network interface 1002, a memory 1004, a processor 1006, I/O device(s) 1008, a bus 1010 and a storage device 1012. The choice of processor is not critical as long as a suitable processor with sufficient speed is chosen. A GPU is also able to be included. The memory 1004 is able to be any conventional computer memory known in the art. The storage device 1012 is able to include a hard drive, CDROM, CDRW, DVD, DVDRW, High Definition disc/drive, ultra-HD drive, flash memory card or any other storage device. The computing device 1000 is able to include one or more network interfaces 1002. An example of a network interface includes a network card connected to an Ethernet or other type of LAN. The I/O device(s) 1008 are able to include one or more of the following: keyboard, mouse, monitor, screen, printer, modem, touchscreen, button interface and other devices. Drone-based 3D motion reconstruction application(s) 1030 used to implement the drone-based 3D motion reconstruction method are likely to be stored in the storage device 1012 and memory 1004 and processed as applications are typically processed. More or fewer components shown in Figure 10 are able to be included in the computing device 1000. In some embodiments, drone-based 3D motion reconstruction hardware 1020 is included. Although the computing device 1000 in Figure 10 includes applications 1030 and hardware 1020 for the drone-based 3D motion reconstruction implementation, the drone-based 3D motion reconstruction method is able to be implemented on a computing device in hardware, firmware, software or any combination thereof. For example, in some embodiments, the drone-based 3D motion reconstruction applications 1030 are programmed in a memory and executed using a processor. In another example, in some embodiments, the drone-based 3D motion reconstruction hardware 1020 is programmed hardware logic including gates specifically designed to implement the drone-based 3D motion reconstruction method. In some embodiments, the drone-based 3D motion reconstruction application(s) 1030 include several applications and/or modules. In some embodiments, modules include one or more sub-modules as well. In some embodiments, fewer or additional modules are able to be included. Examples of suitable computing devices include a personal computer, a laptop computer, a computer workstation, a server, a mainframe computer, a handheld computer, a personal digital assistant, a cellular/mobile telephone, a smart appliance, a gaming console, a digital camera, a digital camcorder, a camera phone, a smart phone, a portable music player, a tablet computer, a mobile device, a video player, a video disc writer/player (e.g., DVD writer/player, high definition disc writer/player, ultra high definition disc writer/player), a television, a home entertainment system, an augmented reality device, a virtual reality device, smart jewelry (e.g., smart watch), a vehicle (e.g., a self-driving vehicle), a drone, or any other suitable computing device. Figure 11 illustrates a diagram of a system configured to implementing the drone-based 3D motion reconstruction method according to some embodiments. In some embodiments, the system includes a set of drones 1100 and a GCS 1102. Each of the drones 1100 includes a camera 1104. The drones 1100 are able to include other features/components including additional cameras, sensors, gimbals, gyroscopes, and/or any other components. As described herein, the drones 1100 are configured to track, predict and be positioned with respect to a subject and each other so as to capture content (e.g., images/video) of the subject from many different angles. In some embodiments, at least one of the drones 1100 is positioned directly above (or above with an offset) the subject, and this drone is a control drone with additional features such as additional software features which enable the drone to communicate and process commands and information to the other drones. The GCS 1102 is configured to communicate with the drones 1100 and process content received from the drones 1100. The drones 1100 are configured to communicate with each other as well. As described herein, the drones 1100 and/or the GCS are able to perform the steps of the drone-based 3D motion reconstruction described herein. In some embodiments, instead of utilizing drones, another set of mobile camera devices are used. To utilize the drone-based 3D motion reconstruction method, multiple drones acquire images and videos of a subject from a variety of angles. The multiple drones are configured to establish a position and track the subject. The drone-based 3D motion reconstruction method is able to be implemented with user assistance or automatically without user involvement (e.g., by utilizing artificial intelligence). In operation, the drone-based 3D motion reconstruction method and system is able to track and follow motion of subjects. The method and system remove the restrictions of past motion capture systems including the space to capture, the location to capture and human-only subjects. For example, the restriction of a dedicated VR shooting place, “the hot seat,” is removed, so the actor/target is able to perform agile or long-distance activities. The resultant pose parameters are used to manipulate an existing surface/volumetric VR actor model. The method is able to be directly integrated into an existing VR/AR production chain. The method and system do not require tremendous building efforts that are required by past motion capture systems. Furthermore, serious site investigation and planning, robust camera installation, complicated wiring, and area marking capturing are not required. The method and system are able to be utilized with studio-level VR/AR for movie or TV production, professional athlete or dancer remote training. The method and system are also able to be utilized for video conferencing, virtual Youtube, gaming, and motion replay (spatiotemporal album). SOME EMBODIMENTS OF METHOD OF 3D RECONSTRUCTION OF DYNAMIC OBJECTS BY MOBILE CAMERAS 1. A method comprising: obtaining a 3D model of a subject generated using a multi-view capturing system; capturing motion of the subject while the subject is moving with a plurality of drones; estimating pose parameters of the subject using the captured motion from the plurality of drones; and applying the pose parameters to animate the 3D model. 2. The method of clause 1 further comprising positioning the subject within views of the plurality of drones by implementing 3D positioning directly. 3. The method of clause 1 further comprising positioning the subject within views of the plurality of drones by implementing 3D positioning indirectly from 2D. 4. The method of clause 1 further comprising using prediction to predict a future location of the subject to determine where to position the plurality of drones. 5. The method of clause 1 further comprising collecting the captured motion from the plurality of drones at a ground control station. 6. The method of clause 5 wherein the ground control station receives videos, positions, and timestamps from the plurality of drones, and sends any controlling or correction commands to the plurality of drones. 7. The method of clause 1 further comprising controlling drone formation of the plurality of drones with a ground control station. 8. The method of clause 1 further comprising controlling drone formation of the plurality of drones with a tracking drone of the plurality of drones. 9. The method of clause 1 wherein each camera of each drone of the plurality of drones is configured to broadcast absolute positions to all other cameras. 10. An apparatus comprising: a non-transitory memory for storing an application, the application for: obtaining a 3D model of a subject; receiving captured motion of the subject while the subject is moving from a plurality of drones; estimating pose parameters of the subject using the captured motion from the plurality of drones; and applying the pose parameters to animate the 3D model; and a processor coupled to the memory, the processor configured for processing the application. 11. The apparatus of clause 10 wherein the application is further configured for positioning the subject within views of the plurality of drones by implementing 3D positioning directly. 12. The apparatus of clause 10 wherein the application is further configured for positioning the subject within views of the plurality of drones by implementing 3D positioning indirectly from 2D. 13. The apparatus of clause 10 wherein the application is further configured for using prediction to predict a future location of the subject to determine where to position the plurality of drones. 14. The apparatus of clause 10 wherein the apparatus receives videos, positions, and timestamps from the plurality of drones, and sends any controlling or correction commands to the plurality of drones. 15. The apparatus of clause 10 wherein the application is further configured for controlling drone formation of the plurality of drones. 16. A system comprising: a plurality of drones configured for capturing motion of a subject while the subject is moving; and a ground control station configured for: obtaining a 3D model of the subject generated using a multi-view capturing system; estimating pose parameters of the subject using the captured motion from the plurality of drones; and applying the pose parameters to animate the 3D model. 17. The system of clause 16 wherein each drone of the plurality of drones is equipped with at least one RGB camera device, wherein a camera’s orientation is controllable by a gimbal attached to each drone of the plurality of drones. 18. The system of clause 16 wherein the plurality of drones are configured for positioning the subject within views of the plurality of drones by implementing 3D positioning directly. 19. The system of clause 16 wherein the plurality of drones are configured for positioning the subject within views of the plurality of drones by implementing 3D positioning indirectly from 2D. 20. The system of clause 16 wherein the plurality of drones are configured for using prediction to predict a future location of the subject to determine where to position the plurality of drones. 21. The system of clause 16 wherein the ground control station is configured for collecting the visual data from the plurality of drones. 22. The system of clause 16 wherein the ground control station is configured to receive videos, positions, and timestamps from the plurality of drones, and send any controlling or correction commands to the plurality of drones. 23. The system of clause 16 wherein the ground control station is configured to control drone formation of the plurality of drones. 24. The system of clause 16 wherein a tracking drone of the plurality of drones is configured for controlling drone formation of the plurality of drones. 25. The system of clause 16 wherein each camera of each drone of the plurality of drones is configured to broadcast absolute positions to all other cameras. The present invention has been described in terms of specific embodiments incorporating details to facilitate the understanding of principles of construction and operation of the invention. Such reference herein to specific embodiments and details thereof is not intended to limit the scope of the claims appended hereto. It will be readily apparent to one skilled in the art that other various modifications may be made in the embodiment chosen for illustration without departing from the spirit and scope of the invention as defined by the claims.