RAO KUNAL (US)
YANG YI (US)
DEBNATH BIPLOB (US)
DROLIA UTSAV (US)
CHAKRADHAR SRIMAT (US)
REDKAR AMIT (US)
RAJENDRAN RAVI KAILASAM (US)
US20200027442A1 | 2020-01-23 |
CHAITANYA DEVAGUPTAPU; NINAD AKOLEKAR; MANUJ M SHARMA; VINEETH N BALASUBRAMANIAN: "Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery", ARXIV.ORG, 15 July 2020 (2020-07-15), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081703015, DOI: 10.1109/CVPRW.2019.00135
KUMAR ARAN CS; BHANDARKAR SUCHENDRA M.; PRASAD MUKTA: "Monocular Depth Prediction Using Generative Adversarial Networks", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 18 June 2018 (2018-06-18), pages 413 - 4138, XP033475665, DOI: 10.1109/CVPRW.2018.00068
JIREH JAM; CONNAH KENDRICK; VINCENT DROUARD; KEVIN WALKER; GEE-SERN HSU; MOI HOON YAP: "Symmetric Skip Connection Wasserstein GAN for High-Resolution Facial Image Inpainting", ARXIV.ORG, 12 September 2020 (2020-09-12), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081761300
WHAT IS CLAIMED IS: 1. A method for real-time cross-spectral object association and depth estimation, the method comprising: synthesizing (1010), by a cross-spectral generative adversarial network (CS-GAN), visual images from different data streams obtained from a plurality of different types of sensors; applying (1020) a feature-preserving loss function resulting in real-time pairing of corresponding cross-spectral objects; and applying (1030) dual bottleneck residual layers with skip connections to accelerate real-time inference and to accelerate convergence during model training. 2. The method of claim 1, wherein object detection is performed in at least one data stream of the different data streams to detect first objects. 3. The method of claim 2, wherein an adaptive spatial search is performed in at least one data stream of the different data streams to form several candidate bounding box proposals as second objects. 4. The method of claim 3, wherein the first objects are fed to a first feature extractor and the second objects are fed to the CS-GAN for data transformation, and then to a second feature extractor. 5. The method of claim 1, wherein the CS-GAN includes bottleneck cascaded residual layers along with custom perpetual loss and feature loss functions. 6. The method of claim 1, wherein the CS-GAN includes a first network and a second network, the first network being a thermal-to-visual synthesis network and the second network being a visual-to-thermal synthesis network. 7. The method of claim 6, wherein the first network includes a generator and a discriminator, the generator synthesizing visual images from corresponding thermal patches, and the discriminator distinguishing between real and generated visual images. 8. The method of claim 7, wherein a cyclical loss, an adversarial loss, a perpetual loss, and a feature loss are employed to optimize the generator, and wherein the feature loss estimates a Euclidean norm between feature point coordinates of the real and generated visual images and minimizes an error as training progresses. 9. The method of claim 1, wherein a depth and offset estimator is provided to estimate distance and offset of objects in a scene relative to a sensor of the plurality of sensors by an object specific depth perception network. 10. A non-transitory computer-readable storage medium comprising a computer- readable program for real-time cross-spectral object association and depth estimation, wherein the computer-readable program when executed on a computer causes the computer to perform the steps of: synthesizing (1010), by a cross-spectral generative adversarial network (CS-GAN), visual images from different data streams obtained from a plurality of different types of sensors; applying (1020) a feature-preserving loss function resulting in real-time pairing of corresponding cross-spectral objects; and applying (1030) dual bottleneck residual layers with skip connections to accelerate real-time inference and to accelerate convergence during model training. 11. The non-transitory computer-readable storage medium of claim 10, wherein object detection is performed in at least one data stream of the different data streams to detect first objects. 12. The non-transitory computer-readable storage medium of claim 11, wherein an adaptive spatial search is performed in at least one data stream of the different data streams to form several candidate bounding box proposals as second objects. 13. The non-transitory computer-readable storage medium of claim 12, wherein the first objects are fed to a first feature extractor and the second objects are fed to the CS-GAN for data transformation, and then to a second feature extractor. 14. The non-transitory computer-readable storage medium of claim 10, wherein the CS-GAN includes bottleneck cascaded residual layers along with custom perpetual loss and feature loss functions. 15. The non-transitory computer-readable storage medium of claim 10, wherein the CS-GAN includes a first network and a second network, the first network being a thermal-to- visual synthesis network and the second network being a visual-to-thermal synthesis network. 16. The non-transitory computer-readable storage medium of claim 15, wherein the first network includes a generator and a discriminator, the generator synthesizing visual images from corresponding thermal patches, and the discriminator distinguishing between real and generated visual images. 17. The non-transitory computer-readable storage medium of claim 16, wherein a cyclical loss, an adversarial loss, a perpetual loss, and a feature loss are employed to optimize the generator and wherein the feature loss estimates a Euclidean norm between feature point coordinates of the real and generated visual images and minimizes an error as training progresses. 18. The non-transitory computer-readable storage medium of claim 10, wherein a depth and offset estimator is provided to estimate distance and offset of objects in a scene relative to a sensor of the plurality of sensors by an object specific depth perception network. 19. A system for real-time cross-spectral object association and depth estimation, the system comprising: a memory; and one or more processors in communication with the memory configured to: synthesize (1010), by a cross-spectral generative adversarial network (CS- GAN), visual images from different data streams obtained from a plurality of different types of sensors; apply (1020) a feature-preserving loss function resulting in real-time pairing of corresponding cross-spectral objects; and apply (1030) dual bottleneck residual layers with skip connections to accelerate real-time inference and to accelerate convergence during model training. 20. The system of claim 19, wherein the CS-GAN includes a first network and a second network, the first network being a thermal-to-visual synthesis network and the second network being a visual-to-thermal synthesis network, the first network including a generator and a discriminator, the generator synthesizing visual images from corresponding thermal patches, and the discriminator distinguishing between real and generated visual images, and wherein a cyclical loss, an adversarial loss, a perpetual loss, and a feature loss are employed to optimize the generator. |
[00054] Regarding the dual bottleneck residual block, vanishing gradients are a common issue in deeper networks. Gradients start getting smaller and smaller as they are backpropagated to earlier layers due to a chain multiplication of partial derivatives. Skip connections using residual blocks provide an alternate path for gradients by skipping layers, which helps in model convergence. The intuition behind skipping layers is that it is easier to optimize residual mapping than to optimize the original, un-referenced mapping. Skip connections enable information that was captured in initial layers (where the features correspond to lower semantic information) to be utilized in deeper layers. Without skip connections, low-level information would become abstract as such information travels deeper in the network. Utilizing bottleneck blocks instead of basic residual blocks is beneficial, as it reduces the number of channels for convolutions. This significantly improves the computation time for a forward pass. It also reduces the search space for the optimizer, which improves training. [00055] The exemplary methods introduce the use of a dual bottleneck residual block (Dual BRB 400), shown in FIG. 4, which includes four convolutional blocks using The function squeezes the number of channels by a factor of 4. This decrease reduces the number of channels for function The exemplary methods then have function which expands the number of channels by a factor of 4 similar to the input channels. The exemplary methods have two skip connections in the Dual-BRB 400. The inner skip connection works as an identity for function while the outer skip connection is an identity for the complete Dual-BRB. The outer skip connection serves to provide identity mapping, similar to the one in the basic residual block. [00056] Blocks in Dual-BRB 400 are represented as follows: [00057] [00058] The output from Dual-BRB 400 is: [00059] [00060] 3x3 convolution, which is added on top of the basic bottleneck adds robustness during initial epochs, but doesn’t converge properly in later epochs while training. [00061] Inner skip connection across helps in learning the residual across it, while helping in model robustness and convergence. The intuition for inner skip connection is to create an alternative path for backpropagation of gradients during the later epochs of training which helps with convergence and provides stability during training. [00062] The final equation of y includes a combination of Having this alternative path for the backpropagation of gradients helps in eliminating the function if needed for a particular block instead of eliminating the complete block. [00063] Also, y includes a combination of C(. ) and x, having another alternative path for the backpropagation of gradients across the complete block. This modification in the transformer block helps achieve the real-time inferencing, quality, and accuracy of generated images. [00064] Regarding inferencing, inferencing block 500 is highlighted in FIG. 5. The thermal object image tiles 502 which are obtained from adaptive spatial search are fed to generator (504), which in turn transforms them to the visible spectral domain. These transformed visible spectral images 506 retain structural information of thermal images so that feature points can be extracted. [00065] Regarding the depth and offset estimator 600, as shown in FIG.6, to estimate distance and offset of objects in a scene relative to a sensor, the exemplary methods introduce an object specific depth perception network. [00066] For each incoming frame Y from the visual camera, objects of interest are identified using 2D object detectors. Performance of 2D object detectors is suitable for real-time inferencing even in embedded systems with high degree of accuracy. Since both visual and thermal sensors are separated by baseline distance without being coplanar, images are not mutually aligned with respect to each other. Once the bounding box of objects is identified in visual domain, adaptive spatial search 114 (FIG. 1) is performed to identify object proposals in the thermal domain with proposal areas being a function of sensor displacement, sensor field of views, zoom levels, their resolutions, and relative orientation. [00067] Let visual image Y include where n is the number of objects. [00068] Visual bounding boxes are specifies the pixel co-ordinates (x, y) with a width and a height of the bounding box. [00069] Let the thermal image be X and the associated thermal bounding box proposals are: [00070] where Φ is a transformation function, to estimate the bounding box in the thermal image. N O is estimated by using multiple parameters. The bounding box area of an object is directly proportional to a focal length of a camera when the distance between the camera and the object is unchanged, that is, increasing focal length brings objects closer by narrowing the extent of field of view. Adaptive search also depends on baseline (distance separating the cameras) b, which determines the offset, angle of view, and image resolution. In the exemplary methods, the image resolutions of both cameras are the same and the field of view intersects more than 95%, where the function Φ is heuristically calculated using the ratio of focal lengths of cameras and offset. [00071] Let the pairs represent resolution and focal length of visual and thermal imaging sensor. [00072] Given the heuristic bounding box is estimated as: [00073] where is the horizontal offset. [00074] Using thermal object proposals visual object proposals are expanded, so that each visual (^ 0 ) and corresponding thermal (^ 0 )cropped proposal has the same size. [00075] Next, landmark detection is performed on ^ and feature vector is extracted. [00076] Since landmark detection cannot be performed directly on x, it is covered to using the previously described CS-GAN 116. Landmark detection is performed on and feature vector is extracted. [00077] Let z be an object feature disparity vector. z includes a Euclidean distance between k-feature points and an angle between k-feature points, e.g. [00079] The exemplary embodiments regress distance (d) from sensors and offset (o) of thermal images from the visual camera by training a multiple variable linear regression using 2k explanatory variables. The exemplary methods train the regressor by minimizing the residual sum of squares. Let the coefficients of the distance-estimator model be offset-estimator coefficients are _ , and the distance is then estimated as: [00080] Where ̂ are the distance and offset residuals. [00081] In an exemplary network architecture 700, as shown in FIG. 7, generator network includes an encoder 710, a transformer 720, and a decoder block 730. The encoder network 710 includes a 7x7 convolution layer, followed by down-sampling using two 3x3 convolution layers (with stride-2). The transformer network 720 includes nine dual bottleneck residual blocks (Dual-BRB). Each Dual-BRB includes 1x1 convolution, a residual block, followed by 1x1 convolution again to squeeze and expand the number of channels to reduce computation. The exemplary methods use a full pre-activation residual block of 3x3 convolution. [00082] Skip connection is added from the input of the Dual-BRB to the output of the block in addition to a skip connection between residual block. The Dual-BRBs reduce inference time by a factor of 3.5 compared to basic residual block implementations, without degrading the image quality. The decoder network 730 includes two up-sampling layers of 3x3 transpose convolution (T.CONV) and a 7x7 convolution layer with tanh activation. All the convolution layers are followed by instance normalization (IN). Discriminator networks ^ ^ and ^ ^ classify patches in original and generated images as real or fake. [00083] Training of the generator architecture can be performed by the following algorithm:
[00084] In conclusion, the exemplary methods present a cross-spectral object association and depth estimation technique for real-time cross-spectral applications. The cross-spectral generative adversarial network (CS-GAN) synthesizes visual images that have the key, representative object level features required to uniquely associate objects across visual and thermal spectrum. Features of CS-GAN include a feature preserving loss function that results in high-quality pairing of corresponding cross-spectral objects, and dual bottleneck residual layers with skip connections (a new, network enhancement) to not only accelerate real-time inference, but also speed up convergence during model training. By using the feature-level correspondence from CS-GAN, a novel real-time system is created to accurately fuse information in thermal and full HD visual data streams. [00085] FIG. 8 is a block/flow diagram 800 of a practical application for real-time cross- spectral object association and depth estimation, in accordance with embodiments of the present invention. [00086] In one practical example, one or more sesnors 802 detect objects, such as, objects 804, 806 and provide visual streams and thermal streams to the CS-GAN 116, which includes a feature preserving loss function 850 and dual bottleneck residual layers with skip connection 860. The results 810 (e.g., target objects) can be provided or displayed on a user interface 812 handled by a user 814. [00087] FIG. 9 is an exemplary processing system for real-time cross-spectral object association and depth estimation, in accordance with embodiments of the present invention. [00088] The processing system includes at least one processor (CPU) 904 operatively coupled to other components via a system bus 902. A GPU 905, a cache 906, a Read Only Memory (ROM) 908, a Random Access Memory (RAM) 910, an input/output (I/O) adapter 920, a network adapter 930, a user interface adapter 940, and a display adapter 950, are operatively coupled to the system bus 902. Additionally, the CS-GAN 116 can be employed by using a feature-preserving loss function 850 and dual bottleneck residual layers with skip connections 860. [00089] A storage device 922 is operatively coupled to system bus 902 by the I/O adapter 920. The storage device 922 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid-state magnetic device, and so forth. [00090] A transceiver 932 is operatively coupled to system bus 902 by network adapter 930. [00091] User input devices 942 are operatively coupled to system bus 902 by user interface adapter 940. The user input devices 942 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present invention. The user input devices 942 can be the same type of user input device or different types of user input devices. The user input devices 942 are used to input and output information to and from the processing system. [00092] A display device 952 is operatively coupled to system bus 902 by display adapter 950. [00093] Of course, the processing system may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements. For example, various other input devices and/or output devices can be included in the system, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art. For example, various types of wireless and/or wired input and/or output devices can be used. Moreover, additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art. These and other variations of the processing system are readily contemplated by one of ordinary skill in the art given the teachings of the present invention provided herein. [00094] FIG.10 is a block/flow diagram of an exemplary method for real-time cross-spectral object association and depth estimation, in accordance with embodiments of the present invention. [00095] At block 1010, synthesizing, by a cross-spectral generative adversarial network (CS- GAN), visual images from different data streams obtained from a plurality of different types of sensors. [00096] At block 1020, applying a feature-preserving loss function resulting in real-time pairing of corresponding cross-spectral objects. [00097] At block 1030, applying dual bottleneck residual layers with skip connections to accelerate real-time inference and to accelerate convergence during model training. [00098] As used herein, the terms “data,” “content,” “information” and similar terms can be used interchangeably to refer to data capable of being captured, transmitted, received, displayed and/or stored in accordance with various example embodiments. Thus, use of any such terms should not be taken to limit the spirit and scope of the disclosure. Further, where a computing device is described herein to receive data from another computing device, the data can be received directly from the another computing device or can be received indirectly via one or more intermediary computing devices, such as, for example, one or more servers, relays, routers, network access points, base stations, and/or the like. Similarly, where a computing device is described herein to send data to another computing device, the data can be sent directly to the another computing device or can be sent indirectly via one or more intermediary computing devices, such as, for example, one or more servers, relays, routers, network access points, base stations, and/or the like. [00099] As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” “calculator,” “device,” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon. [000100] Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read- only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical data storage device, a magnetic data storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can include, or store a program for use by or in connection with an instruction execution system, apparatus, or device. [000101] A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. [000102] Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. [000103] Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). [000104] Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the present invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks or modules. [000105] These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks or modules. [000106] The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks or modules. [000107] It is to be appreciated that the term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other processing circuitry. It is also to be understood that the term “processor” may refer to more than one processing device and that various elements associated with a processing device may be shared by other processing devices. [000108] The term “memory” as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc. Such memory may be considered a computer readable storage medium. [000109] In addition, the phrase “input/output devices” or “I/O devices” as used herein is intended to include, for example, one or more input devices (e.g., keyboard, mouse, scanner, etc.) for entering data to the processing unit, and/or one or more output devices (e.g., speaker, display, printer, etc.) for presenting results associated with the processing unit. [000110] The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.
Next Patent: PEEL-OFF PACKAGING FOR PEN NEEDLES AND SYRINGE NEEDLES