Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR ACHIEVING FAST AND RELIABLE TIME-TO-CONTACT ESTIMATION USING VISION AND RANGE SENSOR DATA FOR AUTONOMOUS NAVIGATION
Document Type and Number:
WIPO Patent Application WO/2017/139516
Kind Code:
A1
Abstract:
Described is a robotic system for detecting obstacles reliably with their ranges by a combination of two-dimensional and three-dimensional sensing. In operation, the system receives an image from a monocular video and range depth data from a range sensor of a scene proximate a mobile platform. The image is segmented, into multiple object regions of interest and time-to-contact (TTC) value are calculated by estimating motion field and operating on image intensities. A two-dimensional (2D) TTC map is then generated by estimating average TTC values over the multiple object regions of interest. A three-dimensional TTC map is then generated by fusing the range depth data with image. Finally, a range-fused TTC map is generated by averaging the 2D TTC map and the 3D TTC map,

Inventors:
MONTERROZA FREDY (US)
KIM KYUNGNAM (US)
KHOSLA DEEPARK (US)
Application Number:
PCT/US2017/017275
Publication Date:
August 17, 2017
Filing Date:
February 09, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HRL LAB LLC (US)
International Classes:
G05D1/00; G05D1/02; G06T7/11; G06T7/70
Foreign References:
US20100305857A12010-12-02
US20040239670A12004-12-02
Other References:
BERTHOLD K.P. HORN ET AL.: "Time to Contact Relative to a Planar Surface", 2007 IEEE INTELLIGENT VEHICLES SYMPOSIUM, 13 August 2007 (2007-08-13), XP031126923
LILI HUANG ET AL.: "Tightly-coupled LIDAR and Computer Vision Integration for Vehicle Detection", 2009 IEEE INTELLIGENT VEHICLES SYMPOSIUM, 14 July 2009 (2009-07-14), XP031489909
GUILLEM ALENYA ET AL.: "A comparison of three methods for measure of Time to Contact", IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS 2009, 15 December 2009 (2009-12-15), XP031489909
See also references of EP 3414641A4
Attorney, Agent or Firm:
TOPE-MCKAY, Cary, R. (US)
Download PDF:
Claims:
CLAIM

A system for estimating a time- to-contact an object using vision and range sensor data for autonomous navigation, comprising:

one or more processors and a memory , the memory having executable instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations of:

segmenting an image from a monocular video into multiple object regions of interest, the image being of a scene proximate a mobile platform;

calculating time-to-contact (TTC) values by estimating motion field and operating on image intensities;

generating a two-dimensional (2D) TTC map by estimating average TTC values over the multiple object regions of interest;

fusing range depth data from a range sensor with the image to generate a three-dimensional (3D) FTC map; and

generating a range-fused TTC map by averaging the 2D TTC map and the 3D TTC map.

The system as set forth in Claim 1. further composing operations of: detecting an object in the range-fused TTC map; and

generating a command to cause a mobile piat.fo.rm to move to avoid contact with the object.

The system as. set forth in Claim I , wherein in generating the 3D TTC map, a range data reading is associated with each pixel in the image.

The sysiem as set forth in Claim 1 , wherein in generating the range- fused TTC map, the range-fused TTC map is generated with range-seeded propagation of disparity and. integrated with salient features in foreground objects to generate precise obstacle boundaries and TIC" for objects in the scene proximate the mobile platform.

The system as set forth in Claim 1 , wherein . in. segmenting an image from the monocular video into multiple object regions of interest, a foreground detector is used to segment objects within a foreground of the image.

The system as set forth in Claim 1, wherein a spiking neural network is used to calculate TTC values by estimating motion field and operating on image intensities.

The system as set forth in Claim 1 , wherein, in generating the 3D TTC map, a range data reading is associated with each pixel in the image; wherein in generating the range-fused TTC map, the range-fused TTC map is generated with range-seeded propagation of disparity and integrated with salient features in foreground objects to generate precise obstacle boundaries and TTC for objects in the scene proximate the mobile platform;

wherein in segmenting an image from the monocular video into multiple object regions of interest, a foreground detector is used to segment objects within a foreground of the image; and

wherein a spiking neural network is used to calculate TTC values by estimating motion field and operating on image intensities.

A computer program product for estimating a time~to~eontact an object, using vision and range sensor data for autonomous navigation, the computer program product comprising: non-transitory co puier-readaMe med um ha ving executable instr uctions encoded thereon, such that upon, execution of the

instructions by one or more processors, the one or more processors perform operations of;

segmenting an image from a monocular video into multiple object regions of interest, the image being of a scene proximate a mobile platform;

calculating time-to-contact (TTC) values by estimating motion field and operating on image intensities;

generating a two-dimensional (2D) TTC map by estimatin average TTC values over the multiple object regions of interest;

fusing range depth data from a range sensor with the image to generate a three-dimensional (3D) TTC map: and

generating a range-fused TTC map by averaging the 2D TTC map and the 3D TTC map. , The computer program product as set forth in Claim 8, further

comprising instructions for causing the one or more processors to perform operations of:

detecting an object in the range-fused TTC map; and

generating a command to cause a mobile platform, to move to avoid contact with the object. 0. The computer program product, as set forth in Claim 8, wherein in generating the 3D TTC map, a range data reading is associated with each pixel in the image.

11. The computer program produci as set forth to Claim §s whereto in generating the range-fused TTC map, the .range-fused TTC map is generated with range-seeded propagation of disparity and integrated with salient features in foreground objects to generate precise obstacle boundaries and TTC for objects in the scene proximate the mobile platform.

12. The computer program product as set forth in Claim 8, wherein in

segmenting an image from the monocular video into multiple object regions of interest, a foreground detector is used to segment objects within a foreground of the i mage.

13. The computer program product as set forth in Claim 8, wherein a spiking neural network is used to calculate TTC values by estimating motion field and operating on image intensities.

1 . The computer program product as set forth in Claim 8, wherein in

generating the 3D TTC map, a range data reading is associated with each pixel in the image;

wherein in generating the range-fused TTC map, the range-fused TTC map is generated with range-seeded propagation of di sparity and integrated with salient features in foreground objects to generate precise obstacle boundaries and TTC for objects in the scene proximate the mobile platform;

wherein in segmenting an image from the monocular video Into multiple object regions of interest, a foreground detector is used to segment objects within a foreground of the image ; and

wherein a spiking neural network is used to calculate T TC values by estimating motion field and operating on image intensities. J 5. A compute* implemented method for estimating atime-to-contact. an object using vision and range sensor data for autonomous navigation, the method c mprising an act of:

eawsing one or more processors to execute instructions encoded on a non-transitory computer-readable medium, such that upon execution, the one or more processors perform operations of:

segmenting an image from a monocular video into multiple object regions of interest, the image being of a scene proximate a mobile platform;

calculating time-to-contact (TTC) values by estimating motion field and operating on image intensities;

generating a two-dimensional (2D) TTC map by estimating average TTC values over the multiple object regions of interest;

fusing range depth data from a range sensor with the image to generate a three-dimensional (3D) FTC map; and

generating a range-fused TTC map by averaging the 2D T TC map and the 3D TTC map.

16. The method as set forth in Claim 15, fmther comprising operations of:

detecting an object in the range-fused TTC map; and generating a command to cause a mobile platform, to move to avoid contact with the object.

.17. The method as set forth in Claim 15, wherein in generating the 3D TTC map, a range data reading is associated with each pi xel in the image.

18. The method as set forth in Claim 15, wherein in generating the range- fused TTC map, the range-fused TTC map is generated w ith range- seeded propagation of disparity ami . integrated with salient features in foreground objects to generate precise obstacle boundaries and TIC" for objects in the scene proximate the mobile platform,

19. The method as set forth in Claim 15, wherein in segmenting an image from the monocular video into multiple object regions of interest, a foreground detector is used to segment objects within a foreground of the image.

J 0. The method as set forth in Claim 15, wherein, a spiking neural network is used to calculate TTC values by estimating motion field and operating on irnaae intensities.

21. The method as set forth in Claim 15. wherein in generating the 3D TTC map, a range data reading is associated wit each pixel in the image; wherein in generating the range-fused TTC map, the range-fused TTC map is generated with range-seeded propagation of disparity and integrated with salient features in foreground objects to generate precise obstacle boundaries and TTC for objects in the scene proximate the mobile platform;

wherein in segmenting an image from the monocular video into multiple object regions of interest, a foreground detector is used to segment objects within a foreground of the image; and

wherein a spiking neural network is used to calculate TTC values by estimating motion field and operating on image intensities.

Description:
[000.1 ] SYSTEM AND METHOD FOR ACHIEVING: FAST AND RELIABLE TIME-TO-CONTACT ESTIMATION USING VISION AND RANGE SENSOR DATA FOR AUTONOMOUS NAVIGATION [0002] CROSS-REFERENCE TO RELATED APPLICATIONS

[0003] The present application is a. Continuation -in-Part application of U.S. Serial No. 1.5/271,025, filed on September 20, 2016, which is a non-provisional application of U.S. Provisional Application No. 62/221,523, filed on September 1 , 2 15, both of whic are hereby incorporated herein by reference,

[0004] The present application is ALSO a Continuation-in-Part application of U.S.

Serial No. 14/795,884, filed July 09, 2015, which is a Contimiation-in-Part. application of U.S. Serial No. 14/680,057 filed 4/672015, both of which are hereby incorporated herein by reference.

[0005] The present application is ALSO a non-provisional patent application of U.S. Provisional. Application No. 62/293,649, filed on February 10, 2016, the entirety of whic is hereby incorporated herein by reference. [0006] STATEMENT REGARDING FEDERAL SPONSORED RESEARCH OR

DEVELOPMEN T

[0007] This invention was made with Government support under Contract

No.BROl 1 -09-C-000 awarded by DARPA. The government has certain rights in the invention,

[0008] BACKGROUND OF INVENTION

[0009] (.1 ) Field of Invention

[00 10] The present invention relates to a system for detecting obstacles reliably with their ranges by a combination of two-dimension al and three-di niensi onal sensing and, more- specifically, to such a system used to generate, an accurate time-to-contact map for purposes of autonomous navigation.

00011} (2) Description of Related Art

00012] Obstacle detection and avoidance is a crucial, task that is required to realize autonomous robots and/or navigation. Some systems utilize range sensors, such, as LIDAR or RADAR sensors (see the List of Incorporated Literature

References, Literature Reference No. 1), that have the ability to provide accurate estimation of looming obstacle collisions. Others ha ve attempted to use smaller sensors suc as monocular cameras to detect and avoid looming obstacles (see Literature Reference N ' os. 2, 3, 4 and 5). Monocular cameras achieve the low SWaP requirements for autonomous systems; however, one of the main challenges with using monocular cameras is that each camera frame by itself inherently cannot provide depth data from the scene. Thus, depth information and subsequen t camera frames are typically used to give an estimation of the depth of the scene.

[00013 j However, it is challenging to detect obstacles and estimate time-to-contact or time-to-collision (TTC) values reliably and rapidly from passive visio (optica! flow, stereo, or structure from motion) due to inconsistent feature tracking, texture-less- environments, limited working ranges, and or intensive computation required. Active range sensing can provide absolute and error-less distances to (both far and near) obstacles; however, these types of sensors (i.e., two- dimensional (2D) laser scanners, three-dimensional (3D) light detection and ranging (LIDAR) or red/gteen/hlue/depth (RGB-D) cameras) are usually heavy/bulky, output sparse point clouds, operate at low frame-rates or are limited to reliably working indoors. 00014] Ther are many techniques developed for obstacle detection and TTC estimation for autonomous navigatioa (and also generally for computer vision and robotics applications). For example, most monocular/optical flow based approaches require expensive computations and could produce an unacceptable amount of false detections while providing relative TTC only. Stereo-based depth estimation is limited to the working range (usually shorter look-ahead) constrained by the baseline length and performs very poor in texture-less environments and on homogeneous surfaces. Structure from motion requires at least several frames taken at different viewpoints. Depth estimation by passive sensing (i.e., using cameras) inherently involves errors propagated from the uncertainty in the pixel domain (miss matching, lack of features). On the other hand, active sensing by a laser scanner or a 3D LiDA sensor can provide absolute and more accurate TTC or depth measurement than 2D 5 but these types of sensing mostly require high SWaP (i.e., size, weight, and power) and produce sparse point clouds. Optimal fusion using 2D and 3D sensors has not been well exploited for high speed navigation.

00015] Existing TTC Map (or depth map) estimation can be broken down by sensor modality. The most relevant for low SWaP constraints is the usage of a single passive sensor (monocular camera). Methods based o scale change (see

Literature Reference Nos. 5 and 6) are often very computationally expensive as they rely on feature tracking and scale change detection via methods like template matching. These methods also provide only relative depth of objects, as the must rel oft image segmentation to (for example) distinguish only foreground from background. The lack of absolute TTC and slow process rate does not make them suitable for maneuvers where a quick; reaction must be achievable. 00016] Obtaining mote accurate depth maps can be done by using learning methods (see Literature Reference Nos, 7 and 8). These methods operate at a lower image domain (pixels/filters on features) aid can provide a relative depth ma quickly, but do not generalize well to cluttered, environments as the learned templates for classifying the image ma not cope well with unseen structure or objects.

0001.7] One of the more popular methods for TT ' C estimation involves computation of optical flow (see Literature Refereoce Nos. 6, 8 and 9). However, estimating the optical flow relies on motio parallax. This method often requires tracking feature motion between frames (consuming computation time) and fails for obstacles found along the optical axis of the camera. Another popular ' method for building TTC Maps is achieved by stereo (see Literature Reference Nos. 10 and 1 1). Both of these metliods quickly compute accurate depth maps, but they are limited to the camera pair baseline with regard to look-ahead time and object perception is limited by the texture and homogeneity of the surface, if one desires to build more accurate TTC depth maps using structure from motion (as is usually the case in stereo configurations) then one needs to use sufficient (30 or more frames) (see literature Reference No. 2) to build a dense map where an object can be identified well. Alternatively, although real-time depth maps can he obtained as in (see Literature Reference No. 14) at the loss of point- density, such a technique is not suitable for accurate representation of an object.

[00018] There are also methods which attempt to fuse multiple sources of

information (stereo and monocular cameras) (see Literature Reference No. 9) and (sonar, stereo, scanning laser range finder) (see Literature Reference No. 12). While the depth map accuracy improves significantly, the excessively high S WaP requirement to operate their system limits the mission duration as well as the maneuverability- of robot or autonomous platform. Other methods (see literature Reference No. 13) provide a depth map wherein robot could be accurately localized, but objects would be sparsely represented, as structure from motion is the key method of -registering auxiliary depth information. [00019] While each of th state-of-the-art methods mentioned above work well in their own regard, they do not yet have the ability to achieve high-speed agile exploration and navigation in cluttered environment under low SWaP constraints. Thus, a continuing need exists for a system that provides for fast and successful obstacle detection and avoidance in autonomous navigation in a timely fashion for a variety of different tasks.

[000203 SUMMARY OF INVENTION

[0002 ] This disclosure provides a system for estimating a time-to-contact an object using vision and range sensor data for autonomous navigation. In various embodiments, the system includes one or more processors and a memory. The memory has exectiiable instructions encoded tiiereon, such that upon execution of the instructions, the one or more processors perform several operations of, such as segmenting an image from a monocular video into multiple object regions of interest, the image being of a scene proximate a mobile platform; calculating time-to-contact (TTC) values by estimating motion field and operating on imag intensities; generating a two-dimensional TTC map by estimating average TTC values over the multiple object regions of interest; generating a three-dimensional TTC map by fusing range depth data from a ■range sensor with the image; and generating a range-feed TTC map b averaging the 2D TTC map and the 3D TIC map.

[00022] In another aspect, the system performs operations of detecting an object in the range- fused TTC map, and generating a command (such as move left, etc.) to cause a mobile platform to move to avoid contact with, the object. |00023] Further, in generating the 3D TTC map, a range data reading is associated with each pixel in the image.

[00024] in another aspect, in generating the range-fused TTC map, the range-fuse T TC map is generated with raage-seeded propagation of disparity and integr ated with salient features in foreground objects to generate precise obstacle

boundaries and T TC fo objec ts in the scene proximate the mobile platform.

[00025] I yet another aspect, in segmenting an image from the monocular video into multiple object regions of interest, a foreground detector is used to segment objects within a foreground of the image.

[00026 ] Additionally, a spiking nearal network is used to calculating time~to~contact (TTC) values by estimating motion field and operating on image intensities,

[00027] Finally, the present invention also includes a computer program product and a computer implemented method. The computer program product includes computer-readable instructions stored on a non-transitory computer-readable medium that axe executable by a computer having one or more processors, such that upon execution of the instructions, the one or more processors perform the operations, listed herein. Alternatively,. -the computer implemented method includes an act of causing a computer to execute such instructions and perfor m the resulting operations.

[00028] BRIEF DESCRIPTION OF THE DRAWINGS

[00029] The objects, features and advantages of the present invention will be

apparent from the following detailed descriptions of the various aspects of the invention in conjunction with reference to the following drawings, where: [00030] FIG. 1 is a .block diagram depicting the components of a system according to various embodiments of the present invention;

[00031} FIG. 2 is an illustration of a computer pro ram product embodying an aspect of the present invention;

[00032] FIG, 3 is an illustration of time-to-contact (TTC) map estimation from

monocular vision;

[00033] FIG. 4A Is an illustration of a spike-based collision detection architecture, where spike stage indicates collision warning;

[00034] FIG. 4B is an illustration of a first part of the spike-based collision detection architecture;

[00035] FIG. 4C is an illustration of a second part of the spike-based collisio

detection architecture;

[00036] FIG. 5 is an illustration of two-dimensional (2D) TTC calculation on a

gridded image without segmentation;

[00037] FIG. 6 is an illustration of TTC calculation on a segmented image; and

[00038] FIG. 7 is an illustration of three-dimensional (3D) range information which can be seamlessly combined with vision in the framework of the system described herein for a long-range, dense TTC map.

[00039] DETAILED DESCRIPTION

[00040] The present invention relates to a system for detecting obstacles reliably with their ranees by a combination of two-dimensional and three-dimensional sensing and, more- specifically, to such a system used to generate, an accurate time-to-contact map for purposes of autonomous navigation . The following description is presented to enable one of ordinary skill in the art to make and use t le inventio -and to incorporate It in the context of particular applications.

Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of aspects. Thus, the present invention is not intended to be limited to the aspects presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

[0001 ] lii the following detai led description,. numerous specific details are set .forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one skilled in the art that tire present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.

[0002] The reader's attention is directed io all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus * unless expressl stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.

[0003] Furthermore, any element in a claim that does not explicitly state "means for" performing a specified function, or "step for" performing a specific iancttoa, is not to be interpreted, as a "means" or "step" clause as specified in 35 U.S.C. Section 1.12, Paragraph 6, In particular, the use of "step of ' or "act of in the claims herein is not . intended to invoke the provisions of 35 .U.S.C. 1 12, Paragraph 6.

[0004] Before describing the invention in detail , first a list of cited references is provided. Next, a description of the various principal aspects of the present invention is provided. Subsequently, an introduction provides the reader wit a general understanding of the present invention. Finally, specific details of various embodiment of the present in ven tion are provided to give an

understanding of the specific aspects.

[0GG5] (1) List of incorporated literature References

[0006] The following references are cited throughout this application. For clarity and convenience, the references are listed herein as a central resoorce for the reader. The following references are hereby incorporated by reference as though fully set forth herein. The references are cited in die application by referring to the corresponding literature reference number, as follows; 1. M. S. Darms, P. E. ybski, C. Baker and C. Urmson, "Obstacle detection and tracking for the urban challenge," in IEEE Transactions o

Intelligent Transportation Systems, 2009.

2. H. Alvarez, L. Paz, J. Sturm and IX Creraers, "Collision Avoidance for Quadrotors with a Monocular Camera " in international Symposium on Experimental Robotics, 2014.

3. G. De Croon, E. De Weerdt. C. De Wagter, B. Remes and R. Ruijsrak, "The appearance variation cue for obstacle avoidance," in IEEE

Transactions on Robotics, 2012. I-O, Lee, K.~H. Lee, S.-H. Park, S.-G. Ira an I Park, "Obstacle avoidance for small UAV ' s using monocular vision," in Aircraft

Engineering and Aerospace Technology, 201 1.

T. Mori and S. Soberer, 'Tirst results in detecting and avoiding frontal obstacles from a monocular camera for micro unmanned aerial vehicles," in IEEE ' International Conference on Robotics and Automation (ICRA), 2013.

Saaar and Vi ' sef, Obstacle A voidance bv Corabin ¾ hackgrootKl subtraction, optical flow and proximity Estimation, International Micro Air Vehicle Conference and Competition 201 (1MAV), Delft, The

Netherlands, 2014.

Lenz, Saxena, Low-Power Parallel Algorithms for Single Image based Obstacle Avoidance in Aerial Robots, International Conference on intelligent Robots and Systems (1ROS) 2012.

Roberts and Dellaert, Direct Superpixel Labeling for Mobile Robot

Navigation, international Conference on ntelligent Robots and System (EROS) 2014.

Sukhat e, Combined Optical-Flow and Stereo-Based Navigation of Urban Canyons for UAV, International Conference on Robots and Systems (I.ROS), 2005.

Roland Brockers, Yoshiaki Kuwata, Stephan Weiss, Lawrence Matthies, Micro air Vehicle Autonomouse Obstacle Avoidance from. Stereo- Vision, Unmanned Systems Technology, 2014.

Andrew J. Barry and Russ Tedrake, "Pushb oom stereo for high-speed navigation in cluttered environments. ' " In 3rd Worksho on Robots in

Clutter: Perception and Interaction in Clutter. Chicago, Illinois,

September 2014. J , Matthias Nieuwenhuisen , David Droescfael , Dirk Hols? , S en Betrake, "OttMii-directional Obstacle Perception and Collision Avoidance for Micro Aerial Vehicles," Robotics Science and Systems (RSS) 2013,

13. J. Zhang, M. aess, and S. Singh, "Real Time Depth Enhanced

Monocular Gdoraetry," International Conference on Intelligent Robots and System (IROS) 2014.

1 . Georg Klein and David Murray Parallel Tracking and Mapping for Small A Workspaces In Proc. International Symposium on Mixed and Augmented Reality (2007).

15. B. .P Horn & B.G. Sehurrek, "Determiniiig Optical Flow, "Artificial I telligence, Vol 16, No, 1 -3, August 1981 , pp. 185-203.

1 . Alexander L. Honda., . Yang Chen, Deepak Khosla, "Robust static and moving object detection via multi-scale attentional mechanisms", Proceedings of SHE Vol. 8744, 87440S (2013),

17. U.S. Patent Application Serial No. 14/795,884, filed My 09, 2015, entitled, "System and Method for Real-Time Collision Detection." (2) Principal Aspects

Various embodiments of the invention include three "principal" aspects. The first is a system used to generate an accurate iime-to-cofttaet map for autonomous navigation. The system is typically in the form of a computer system operating software or in the form of a "hard-coded" instruction set. This system may be incorporated into a wide variety of devices that provide different foRCttonalittes. The second principal aspect is a method, typically in the form of software, operated using a data processing system (computer}. The third principal aspect is a computer progr am product. The computer program product generally represents computer-readable instructions stored on a non-transitory computer-readable medium such as an optical storage device, e.g., a compact disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy disk or magnetic tape. Other, non-limiting, examples of computer- readable media include hard disks, read-only memory (ROM), and flash-type memories. These aspects will be described in more detail below.

0009] A block diagram depicting an example of a system (i.e., computer system

100) of the present invention is provided in FIG. L The computer system 100 is configured to perform calculations, processes, operations, and/or functions associated with a program or algorithm. In one aspect, certain processes and steps discussed herein are realized as a series of instructions (e.g., software program) that reside within computer readable memory units and are executed t>y one or more processors of the computer system 100. When executed, the instructions cause the computer system 100 to perform specific actions and exhibit specific behavior, suc as described herein.

[0001 ] The computer system .100 may include an address/data bus 102 that is

configured to communicate information. Additionally, one or more data processing units, such as a processor 104 (or processors), are coupled with the address/data bus 1 2. The processor 104 is configured to process information and instructions, in. an aspect, the processor .104 is a microprocessor.

Alternatively, the processor 1 4 may be a different type of processor such as a parallel processor, appiication-specifk integrated circuit (ASIC), programmable logic array (PLA), complex programmable logic device (CPLD), or a field programmable gate array (FPGA).

00 1 1] The compute system 100 is configured to utilize one or more data storage units. The computer system 100 may include a volatile memory unit 1.06 (e.g., random access memory ("RAM"), static AM, dynamic RAM. etc.) coupled with the address/data bus 102, wherein a volatile memory unit 1 6 is configured to store information and instructions for the processor 104. The computer system 100 further may Include a.nxm- oiattle memor unit 108 (e.g., read-only memory ("ROM"), programmable- ROM- ("PROM*), erasable programmable ROM .{"EPROM"), .electrically erasable programmable ROM "EEPRGM"), flash memory, etc.) coupled with the address/data bus 102, wherein the nan- volatile memory unit 108 is configured to store static information and instructions for the processor .104. Alternatively, the computer system. 100 may execute instructions retrieved from an online data storage unit such as in "Cloud" computing. In. an aspect, the computer system 1 0 also may include one or more interfaces, such as an interface 1 10, coupled with the address/data bus 102. The one or more interfaces are configured to enable the computer system 100 to interfac with other electronic devices and computer systems. The earamuni cation interfaces implemented by the one or more interfaces may include wireline (e.g., serial cables, modems, network adaptors, etc.) and/or wireless (e.g., wireless modems . , wireless network adaptors, etc.) communication technology. ] In one aspect, the computer system 100 ma include an input device 1 12 coupled with the address/data bus 302, wherein the input device 1 12 is configured to communicate information and command selections to the processor .100. in accordance with one aspect, the input device .1 .12 is an.

alphanumeric input device, such as a keyboard, that may include alphanumeric and or unction keys. Alternatively, the input device 1 12 may be an input device other than an alphanumeric input device. In an aspect, the computer system 100 ma include a cursor control device 114 coupled with the address/data bos 1 2. wherein the cursor control device Ϊ 14 is configured to communicate user input information and/or command selections to the processor 1 0. In an aspect, the cursor control device 114 is implemented using a device such as a mouse, a track-ball, a track-pad, an optical tracking device, or touch screen. The foregoing notwithstanding, in an aspect, the cursor control device 114 is directed aa >r activated via input from the input device 112, such as in response to the use of special keys and key sequence commands associated with the input device 12. In an alternative aspect, the cursor control device 1 1 is configured to be directed or guided by voice commands,

[00013] in an aspect, the computer system 100 further may include one or more optional computer usable data storage devices, such as a storage device 1 16, coupled with the address/data bus 102. The storage device 1 .16 is configured to store information and/or computer executable instructions. In one aspect, the storage device 1 16 is a storage device such as a magnetic or optical disk drive

(e.g.. hard disk drive . "HDD"), floppy diskette, compact disk read only memor ("CD-ROM"), digital versatile disk ("DVD")). Pursuant to one aspect, a display device 1 I S is coupled with the address/data bus 102, wherein the display device 118 is configured to display video and/or graphics. In an aspect, the display device 118 may include a cathode ray tube ("CRT"), liquid crystal display

("LCD"), field emission display ("FED"), plasma display, or any other display device suitable for displaying video and/or graphic images and alphanumeric characters recognizable to a user. [0001.4] The computer system 100 presented herein is an example computing

environment in accordance with an aspect. However, the non-limiting example of the computer system 300 is not strictly limited to being a computer system. For example, an aspect provides that the computer system 100 represents a type of data processing analysis that ma be used in accordance with various aspects described herein. Moreover, other computing systems may also be

implemented. Indeed, the spirit and scope of the present technology is not limited to an single data processing environment. Thus, in an aspect, one or more operations of various aspects of the present technology are c ontrolled or implemented using computer-executable instructions, such as program modules, being- .executed by a computer . In rne implementation, such program modules include routines, programs, , objects, components and/or data structures that are configured to perform particular tasks or implement particular abstract data types. In addition, an aspect provides that one or more aspects of the present technology are implemented by utilizing one or more distributed computing environments, such as here tasks are performed by remote processing devices that are linked through a communications network, or such as where various program modules are located in both local and remote computer-storage media including memory-storage devices.

[0001.5] An illustrative diagram of a computer program product (i.e., storage device) embodying the present invention is depicted in FIG. 2. The computer program product ' is depicted as floppy disk 200 or an optica! disk 202 such as a CD or DVD. However, as mentioned previously, the computer program product generally represents computer-readable instructions stored on any compatible non-transitory computer-readable medium. The term "instructions" as used with respect to this invention generally indicates a set of operations to be performed on a computer, and may represent pieces of a whole program or individual, separable, software modules. Non-limiting examples of "instruction" include computer program code (source or object code) and '' ' hard-coded" electronics

(i.e. computer operations coded into a computer chip). The "instruction" is stored on any non-transitory computer-readable medium, such as in the memory of a computer or o a floppy disk, a CD-ROM, and a flash drive. In either event, the instructions are encoded on a non-transitory computer-readable medium.

[00016] (3) Introduction

[00017] Obstacle detection and avoidance is a crucial task that is required to realize autonomous robots and or navigation. In operation, depth information and subsequent camera frames are typical l used to give m estimation of fee depth of the seme. Given this depth (and the change in depth over time), an autonomous navigation system can estimate a tkne-to-eontact or time-io- collision (TTG) value which can be used to detect and a v oid looming obstacles, The system of this disclosure improves upon the prior art by fusing monocular obstacle detection and avoidance (such as those systems shown in Literature Reference No. 17) with a L∑DAR system. Thus, this disclosure is directed to a robotic system for detecting obstacles reliably with their ranges via two- dimensional (2D) (e.g., image) and three-dimensional (3D) (e.g., LIDAR) sensing to generate an accurate dense time-to-contact (TTC) map that is used to avoid such obstacles.

[00018] By estimating the TTC, the sy stem can determine i f there is an Imminent obstacle for autonomous navigation. Furthermore, by estimating the TTC in different regions of interest, the system is able to make a calculated choice for path planning. Unique aspects of the system include (!) vision-based looming foreground detection that combines neuromorphic (or optical flow-based) obstacle detection and foreground-background estimation for high-speed reactive navigation, and (2) fusion of long-range sparse 3D with dense

monocular camera to give a robust (and absolute) dense TTC map for obstacle detection. The neuromorphic and fusion solution of the system works in complex, cluttered environments, including small obstacles such as tree branches found outdoors or toppled objects indoors. The system also provides accurate detection in all ambient conditions including structured uastructured en vironments using wide-angle detection to keep objects of interest in view while making aggressive .maneuvers at high speed.

[00019] The system described herein greatly improves upon the state of the art for several reasons. For example, the bio-inspired and neuromorphic approach of TTC map estimation is efficient (by search, space reduction) * reliable (by foregroun segmentation), and fast and low-power consuming (b neuromorphic implementation), TTC computation can be greatly accelerated by preprocessing die incoming video using a foi¾groi d%ackground separation algorithm, A purpose of foreground detection is to locate potential obstacle regions and pass them on to the TTC estimation stage, significantly reducing the amount of runtime processing of TTC by two to ten times (assuming targets occupy 10%-50% of the entire scene). It will also increase the robustness of TTC since TTC values in a region-of-ioterest (KOI) can be integrated only on foreground areas thus reducing false alarms. This will also increase the robustness to scene noise via the elimination of background (e.g. , sky, ground-plane). Further, -when 3D range sensing is available or is needed, to complete a specific mission (e.g., when lighting conditions change significa tly, far-distanced obstacles need to be detected earlier, or there is an extra room in the system payload), the vision- based algorithm of the system can be augmented with 3D range information, by combining gridded monocular TTC estimation with 3D-based TTC estimation for increased accuracy. The gridded TTC estimation is also memory efficient b representing object information in the form of a T TC map instead of a discretized world. ] A s can be appreciated by those skilled in the art, the system of this disclosure can be applied to any system that requires a highly accurate TTC .map for the purposes of obstacle detection and avoidance. For example, the fi nal range-fused TTC map is useful for obstacle detection and avoidance on

autonomous mobile platforms (e.g., unmanned aerial vehicles (IJ AV's) and self- driving cars). Such mobile platforms are a natural fit for this invention because many such systems endeavor on autonomous navigation and exploration with no collision. The ability to detect and avoid obstacles with high accuracy increases the probability of mission success and is required for other subsequent processes such as global navigation and -path. -planning- modules. Other no -limitin examples of useful applications include indoor mapping, .hazardous/forbidden area -exploration, environment monitoring, virtual tours, blind people assistance, •etc. For further understanding specific details regarding the system are provided below.

[0002.1 ] (4) Specific Details of Various Embodiments

[00022] As noted above, the system of this disclosure provides for fusion of an

improved.2D TTC calculation system with nterpolated LIDAR data (and its corresponding 3 D TTC estimate or map). First and as show in FIG. 3 5 the system receives sensor data 300 (e.g., from a monocular video camera) which is used tor die pre-TTC computation stage of background-foreground

segmentation (i.e., neuromorphic foreground detection 302). Then the underlying approach for TTC calculation is performed (i.e.. neuromorphic spiking TTC estimation 304) to generate a 2D TTC map 306, Finally, the LiDAR based TTC estimation is performed to generate the absolute 3D TTC map and its fusion with the 2D system is performed to generate the resulting range-fused TT map (as show in FIG. 7).

[00023] in other words, based on prior work with spike-based processing for

"looming" obstacle detection (see Literature Reference No. 17) and foreground- background estimation from a fast moving camera platform in high-cmfter (see Literature Reference No. 16), the present invention augments and combines them to enable accurate and high-update rate detection of obstacles and TTC estimates in the camera frame.

[00024] For neuromorphic foreground detection 302, the system uses a vision-based looming foreground detector with TTC estimation that is inspired by the locust loom-detection model and implemented using a. spiking neural network (S N) approach (see Literature Reference No, 1.7). It has been previously

demonstrated that a TTC map based on spiking responses can be supplemented in local image regions with rapid optical flow calculation (see Literature Reference No. 17), Camera rotation is compensated for via 1MU integration and a correcting homography transformation.

[00025] An example of the looming foreground detector that provides for

neuromorphie foreground detection 302 is depicted in FIG. 4A. The looming foreground detector in this example requires 5 neurons per pixel and can be efficiently mapped to a spike-based processor {e.g., 320 x 240 image, 30 Hz, 2.5 mW). Note thai spike-domain processing is not a requirement of this system as it can also be run in a non-spiking domain. [00026] More specifically, FIG. 4A shows an overall diagram of the spiking collision detection model according to some embodiments of the present invention. The diagram illustrates the combination of the first part of the spiking collision detection model which is shown in FIG. 4B, and the second part of the spiking collision detectio model (the DCMO cell model), which is shown in FIG. 4C. As depicted la FIG, 4A, for the detection of left motion I 300, the excitatory input to the model left LGMD cell 900 is all the spikes for edges moving left 902 that are the output of the spikin Reichardt detector for detecting left motion 904 on the left visual field 800, The inhibitory (negative) input is all the spikes for edges moving left 906 that are the output of the spi king Reichardt detector for detecting left motio 908 on the right visual field 802, The left spikes accumulator 1206 accumulates spikes from the model left LGMD ceil 900 for detection of left moti on 1300. The abo ve is als true for the detecti on of right motion .1302, u motion 1304, and down motion 1306. As described in FIG. 4C, the model DCMD cell 1214 sums all spikes accumulators (elements 1206, 120S, 121 , and 1212) and decides whether to generate a collision flag 1216. This process is described in further detail m U.S. Serial No. 1 7 5 8 , titled. System and Method for Real-Time Collision Detection, filed 7/9/2015, the entiret of which is incorporated herein by reference. ) TTC computation can be greatly accelerated by preprocessing the incoming video using a foreground/background separation algorithm. The purpose of foreground detection 302 is to locate potential obstacle regions and pass them on to the TTC estimation stage (i.e., element 304), significantly reducing the amount of run -time processing of TTC by 2- 10 times (assuming targets occupy 10-50% of the entire scene). The foreground separation will also increase the robustness of TTC since TTC va es in ROI can be integrated only on foreground areas, thus reducing -false alarms. Although other techniques may he used for foreground detection 302, the neuromorphic and bio-inspired method (see Literature Reference No. 1 ) based on spectral residual sahency (RS) is desirable due to its speed and efficiency. The RS method exploits the inverse power Saw of natural images with the observation that the average o !og- spectrums is locally smooth. This enables detecting salient objects based on the log-spectrum of individual images rather than ensemble of images thus streamlining the process to operate on a frame by frame basis. This segmented image (from element 302) is then fed to the 2D TTC map building method. ] With respect to the present invention, the system builds a Tjme-To-Contact (TTC) Map using a monoc ular camera by directly estimatin the motion field and operating on image intensities-. The -underlying assumption that allows for simplification of the T IC calculation is the constant brightness assumption as referenced in Literature Reference No. 15, and as copied below: The equation provided above is the constant brightness assumption equation, if E(x, y, t) Is the brightness at image point (x, y) at time t, then it is assumed that this equation holds. That is, it is assumed tha -as the i mage of some feature moves, it does not change brightness.

[00029] The simplest c ase of obstacle detection is to constrain the robot to motion along the optica! axis towards a planar object, whose normal is also

perpendiciilai' to the camera's optica! axis, in this scenario, the components of motion (U, V) of a point along the axes perpendicular to the optical axis are 0. Thus, lising the perspective projection equations and taking the derivative, it is determined that:

which, can be simplifi d as follows:

LT.r - - .£:·.< where C « * - W/Z « = 1 / TTC (Z « = pixel location in space along optica! axis, W = derivative of said pixel location) and G ~ xEx 4- yEy. The above can now be formulated as a least-squares problem where the sum is taken over all pixels in a region of interest [00030] After minimizing the problem and solving for€ (the inverse of the TTC), the foil owing result is obtained:

[0003 I ] Intuitively, t e TT is seen as the spatial, change iti image intensities .divided by the temporal change of the pixel intensities. Since an object closer to the camera will expand faster than one in the background, this directly translates to a larger Et and a smaller "radial gradient", xEx + yEy, over the region of interest The inverse is true for objects farther away; particularly for well textured objects where the gradient is strong. [00032] When the problem cannot b simplified by the above mentioned motio

assumption, a more general approach must be taken. In the cases of arbitrary motion relative to a plane perpendicular to the optical axis or trans lational motion along the optical axis relative to a arbitrary plane, both have a closed form, solution requiring only the solution to three equations in three unknowns. In the slightly more complicated case where the motion and plane are a combination of the two latter cases, then, an iterative method must be used. Here, due to the non-linear nature of the equations, an initial guess for a subset ■of the parameters must be proffered, after which alternation is employed for a few iterations to approximate the TTC.

[00033] If it is desired to localize objects more accurately, the image can be

partitioned into a larger grid. While this allows for fine resolution of the object to be determined, a more segmented object is determined as some areas may not be sufficiently textured to exhibit the same TTC as other regions. FIG. 5 shows. ' the 2D godded approach without foreground-background segmentation.

Specifically, FIG. 5 depicts a fixed grid 502 of a segmented image. In other words, the image is segmented into a fixed grid 502 having a plurality of sub- regions. Although for clarity a single TTC graph 504 is depicted, it should be understood .that each, sub-region includes a TTC graph 504. Each TTC graph. 504 shows the TTC value changes over time in the corresponding sub-region 506 in the fixed grid 502. The line (at the value 120) indicates a predetermined, threshold for collision detection. In this example, if TTC > 120, then there could be a collision in the sub-region 506. The x-axis in the TTC graph 504 is frame number, while the y-axis is frames-to-collision. In other words, the mobile platform will hit an obstacle after y frames. The actual time depends on the f ame rate of sensing. For example, in. the 30 frames per second (fps) sensor system, y~120 f meaning that the mobile platform lias 4 seconds until collision with the obstacle.

[00034] If a segmented image is fed to the ' 2D TTC system, a more accurate estimate can be obtained. Given the estimate of the TTC (here determined as Frames-To- Collision} one can use the frame rate (difference between input frame time stamps) to compute an absolute tim until collision. Then, given a velocity estimate, one can build a depth map, which can be used to generate move commands for the mobile platform to cause the mobile platform to avoid the collision (e.g., move right in 10 feet, etc.). An experiment was conducted to evaluate the monocular vision-based time to contact using a hiect GBD sensor and a core i5 NUC mounted on a quadcopter. Using die depth data for ground-truth only, TTC accuracy was estimated, at a range of 10m to be approximately 90% for up to 3 objects in the path of the quadcopter across a range of indoor object types and lighting conditions. FIG. 6 shows an example TTC ma 60 obtained once the background, has been eliminated from the original image 602, as wel as time to collision tor a certain region of interest.

[00035] Unlike the TTC value grid in FIG. 5, FIG. 6 shows T TC 600 per each

segmented region (in this case, two trees and a. stop sign) obtained from the foreground detector 302 (as shown in FIGs. 3 and 4). The■bottom graph depicts a resulting fratne-to-eolKsion computation 604, which shows the time to collision (in. seconds) decreasing as frames progress, .meaning in this example that the sensor platform Is approaching an object and has v seconds to contact at frame x.

[00036] The method in this disclosure proceeds to make use of absolute range data to build a more accurate TTC map. Since what is of interest is detection of objects over a large range, disparate sources of information, must be fused together. A camera image provides iBfomiation useful in resolving object boundaries, while a 3 D LIDAR range finder provides depth information which becomes sparser and more diffuse as dista ce to target ncreases,

[00037] FIG. 7 shows how the vision-based framework of the present invention can be seamlessly augmented with raage information (e.g., from a laser scanner) to compute a long-range TTC map. Specifically, FIG. 7 illustrates how the high- frequency vision-based method of TTC estimation (as shown in FiGs. 3 and 4) can be combined with low to mid frequency range data provided by a range sensor, like LIDAR or a laser scanner. The inputs to the fusion system are a monocular video stream 700 and range data 702 and the output is a range-fused dense TTC map 704 that can be used as described above. The detailed fusion procedure is provided in further detail below.

[00038] To fuse the information (i.e., monocular video stream 700 and range data

702), each LIDAR reading is associated with a pixel in the image (obtained from the video stream 700) to generate a range data/camera fusion 706. To construct the dense TTC map 704, for every pixel, its neighbors are searched for matching color values and range readings with search radius guided by the distance to target and LIDAR scanline resolution. The final distance is determined as averaging of a box filter with each element weighted by pixel -similarity and anchored at the center of the filter. The method has the effect of soper-reso ing sparse 3D points; thus creating a denser range map (converted to an absolute TTC map 708 by using the known -velocity). [00039] The 2D portion -of the. process is depicted in FIG . 3 and described above, in which the monocular video stream 700 proceed through foreground separation 302 and neurotnorphic spiking TTC estimation 304 to generate the resulting 2D

TTC map 306. [00040] The final step is averaging of die TTC maps 306 and 708 obtained by the 2D aad 3D systems to generate the range- fused TTC ma 704, which reduces false detections and provides absolute TTC (depth) measurement by Ka!man filtering fusion. The fusion algorithm of the present system gives a precise obstacle boundar and TTC via range -seeded propagation of disparity and integrating with the salient features i the foreground objects. One key benefit of this approach is that it gives an approximate two to fi ve times higher densi ty of point c!oud in the 3D space, which can then be used to do obstacle detection even in the 3D point cloud space. [000 1 J An experiment, has been conducted that fuses data from a. Lady Bug camera (produced by Point Grey Research, located at 1.2051 Riverside Way

Richmond, BC, Canada) and a Velodyne HDL-32E LIDAR (produced by Veiodyiie, located at 345 Digital Drive, Morgan Hill, CA 95037) to detect obstacles and build a TTC map. Experimental results suggest that the fusion scheme has detection accuracy of 80% (compared to 47% with LIDAR. only) for human-size obstacles at a distance of 25 meters and that fused results have a greater probability (greater than 50%) at longer ranges than raw or 2 TTC in formati on alone . [00042] The average neighboriag-ppmt distance and density in a 3D point -cloud cluster can directl affect the probability of obstacle detection. For example, for a human object detection using K»n¾eans clustering, the 2D/3D fusion method can still, detect 80% (compared to the raw ease at <50%) when placed at 25 meter. Note that depending on the choice of threshold sealer in the obstacle detection/clustering algorithm, the probability could be higher (a more relaxed algorithm) or lower (a more strict algorithm).

[00043] Finally, while this invention has: been described in terms of several

embodiments, one of ordinary skill in the art will readily recognize that the in vention may have other applications in other enviromnents, ft should be noted that many embodiments and implement tions are possible, further, the following claims are is. no way intended to limit the scope of the present invention to the specific embodiments described above. In addition, any recitation of "means for" is intended to evoke a means-plus-funetion reading of an element and a claim, whereas, any elements that do not specifically use the recitatio "means for", are not intended to be read as meaiis-pios-function elements, even if the claim otherwise includes the word "means". Further, while particular method steps have been recited in a particular order, the method steps may occur in any desired order and fall within the scope of the present invention.