Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEMS AND METHODS FOR FUSING LIDAR AND STEREO DEPTH ESTIMATION
Document Type and Number:
WIPO Patent Application WO/2022/081968
Kind Code:
A1
Abstract:
By fusing LIDAR and stereo technologies, techniques described herein provide improved estimation of distances to objects over using either LIDAR or stereo data alone. Based on a camera image, an object of interest is detected in an environment. A LIDAR device obtains LIDAR data of objects in the environment and a resolution of the LIDAR data in a vicinity of the detected object is determined. If the resolution of the LIDAR data exceeds a threshold value, a distance to the object is determined based on the LIDAR data. If it does not, a distance to the object is determined based on stereo data in the vicinity of the detected object, which can be a stereo camera image obtained using two cameras. A super-resolution procedure can be used to enhance the resolution of the stereo camera image.

Inventors:
YUAN BAUSAN (US)
BICHU TANMAY NITIN (US)
Application Number:
PCT/US2021/055176
Publication Date:
April 21, 2022
Filing Date:
October 15, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NIKON CORP (JP)
YUAN BAUSAN (US)
International Classes:
G01S17/86; G01S13/86; G01S17/89; G01S17/931
Foreign References:
US20170184704A12017-06-29
US20180232947A12018-08-16
US20180299534A12018-10-18
Other References:
ZSOLT KIRA ET AL: "Multi-modal pedestrian detection on the move", TECHNOLOGIES FOR PRACTICAL ROBOT APPLICATIONS (TEPRA), 2012 IEEE INTERNATIONAL CONFERENCE ON, IEEE, 23 April 2012 (2012-04-23), pages 157 - 162, XP032186074, ISBN: 978-1-4673-0855-7, DOI: 10.1109/TEPRA.2012.6215671
Attorney, Agent or Firm:
CAHILL, Steven (US)
Download PDF:
Claims:
CLAIMS

1. A system for determining a distance to an object in an environment, comprising: a light detection and ranging (LIDAR) device; a first camera; a second camera, wherein the first camera and the second camera are configured to provide stereo data; a processor; and a memory coupled with the processor, wherein the memory is configured to provide the processor with instructions that when executed cause the processor to:

(a) obtain LIDAR data of the environment, wherein the LIDAR data is obtained from the LIDAR device;

(b) obtain a camera image of the environment, wherein the camera image is obtained from the first camera;

(c) detect the object in the environment based on the camera image;

(d) determine a resolution of the LIDAR data in an area of the environment in which the detected object is located based on the camera image;

(e) if the resolution of the LIDAR data in the area of the environment in which the detected object is located exceeds a threshold value, determine the distance to the detected object based on the LIDAR data; and

(f) if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtain stereo data in the area of the environment in which the detected object is located, wherein the stereo data are obtained from the first camera and the second camera; and determine the distance to the detected object based on the stereo data in the area of the environment in which the detected object is located.

2. The system of claim 1, wherein the environment comprises a driving environment and the object comprises an obstacle in the driving environment.

3. The system of claim 1 or 2, wherein the LIDAR data comprises a LIDAR scan.

4. The system of any one of claims 1-3, wherein the resolution of the LIDAR data in the area of the environment in which the detected object is located comprises a vertical resolution of the LIDAR data in the area of the environment in which the detected object is located. The system of any one of claims 1-4, wherein the memory is further configured to provide the processor with instructions that when executed cause the processor to perform (c)-(f) for each detected object in the environment. The system of any one of claims 1-5, wherein the memory is further configured to provide the processor with instructions that when executed cause the processor to:

(g) perform a dense depth procedure of the area of the environment in which the detected object is located; and

(h) determine the distance to the detected object based upon the dense depth procedure if the resolution of the LIDAR scan in the area of the environment in which the detected object is located is about equal to the threshold value. The system of claim 6, wherein the memory is further configured to provide the processor with instructions that when executed cause the processor to perform (c)-(h) for each detected object in the environment. The system of claim 7, wherein the determined distances are used to track detected objects in the environment. The system of claim 7, wherein the environment comprises a driving environment, and wherein the determined distances are used by an autonomous vehicle to track detected objects in the environment during navigation. The system of any one of claims 1-9, wherein (f) comprises: if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtain stereo data of the environment in the area of the environment in which the detected object is located; apply a super-resolution procedure to the stereo data in the area of the environment in which the detected object is located to obtain super-resolution stereo data; and determine the distance to the detected object based on the superresolution stereo data. A method for determining a distance to an object in an environment, comprising:

(a) obtaining LIDAR data of the environment, wherein the LIDAR data is obtained from a LIDAR device;

(b) obtaining a camera image of the environment, wherein the camera image is obtained from a first camera;

(c) detecting the object in the environment based on the camera image; (d) determining a resolution of the LIDAR data in an area of the environment in which the detected object is located based on the camera image;

(e) if the resolution of the LIDAR data in the area of the environment in which the detected object is located exceeds a threshold value, determining the distance to the detected object based on the LIDAR data; and

(f) if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining stereo data in the area of the environment in which the detected object is located, wherein the stereo data are obtained from the first camera and a second camera; and determining the distance to the detected object in the area of the environment in which the detected object is located based on the stereo data. The method of claim 11, wherein the environment comprises a driving environment and the object comprises an obstacle in the driving environment. The method of claim 11 or 12, wherein the LIDAR data comprises a LIDAR scan. The method of any one of claims 11-13, wherein the resolution of the LIDAR data in the area of the environment in which the detected object is located comprises a vertical resolution of the LIDAR data in the area of the environment in which the detected object is located. The method of any one of claims 11-14, further comprising performing (c)-(f) for each detected object in the environment. The method of any one of claims 11-15, further comprising:

(g) performing a dense depth procedure of the area of the environment in which the detected object is located; and

(h) determining the distance to the detected object based upon the dense depth procedure if the resolution of the LIDAR scan in the area of the environment in which the detected object is located is about equal to the threshold value. The method of claim 16, further comprising performing (c)-(h) for each detected object in the environment. The method of claim 17, further comprising tracking detected objects in the environment using the determined distances. The method of claim 18, wherein the environment comprises a driving environment, and wherein the method further comprises tracking detected objects using the determined distances by an autonomous vehicle during navigation. The method of any one of claims 1-19, wherein (f) comprises: if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining the stereo data in the area of the environment in which the detected object is located; applying a super-resolution procedure to the stereo data in the area of the environment in which the detected object is located to obtain superresolution stereo data; and determining the distance to the detected object based on the superresolution stereo data.

A computer program product for determining a distance to an object in an environment, the computer program product being embodied in a tangible computer readable storage medium and comprising computer instructions for:

(a) obtaining LIDAR data of the environment, wherein the LIDAR data is obtained from a LIDAR device;

(b) obtaining a camera image of the environment, wherein the camera image is obtained from the first camera;

(c) detecting the object in the environment based on the camera image;

(d) determining a resolution of the LIDAR data in an area of the environment in which the detected object is located based on the camera image;

(e) if the resolution of the LIDAR data in the area of the environment in which the detected object is located exceeds a threshold value, determining the distance to the detected object based on the LIDAR data; and

(f) if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining stereo data in the area of the environment in which the detected object is located, wherein the stereo data are obtained from the first camera and a second camera; and determining the distance to the detected object based on the stereo data in the area of the environment in which the detected object is located.

The computer program product of claim 21, further comprising computer instructions for:

(g) performing a dense depth procedure of the area of the environment in which the detected object is located; and

(h) determining the distance to the detected object based upon the dense depth procedure if the resolution of the LIDAR scan in the area of the environment in which the detected object is located is about equal to the threshold value.

22 The computer program product of claim 21 or 22, wherein (f) comprises: if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining stereo data in the area of the environment in which the detected object is located; applying a super-resolution procedure to the stereo data in the area of the environment in which the detected object is located to obtain superresolution stereo data; and determining the distance to the detected object based on the superresolution stereo data.

23

Description:
SYSTEMS AND METHODS FOR FUSING LIDAR AND STEREO DEPTH ESTIMATION

CROSS-REFERENCE

[0001] The present application claims priority to U.S. Provisional Patent Application No. 63/093,028, filed October 16, 2020, entitled “FUSING LIDAR AND STEREO CAMERA DATA FOR DEPTH ESTIMATION AND OBJECT DETECTION” and U.S. Provisional Patent Application No. 63/092,927, filed October 16, 2020, entitled “IMPROVED STEREO DEPTH RESOLUTION USING SUPER RESOLUTION TECHNIQUES,” each of which is entirely incorporated herein by reference for all purposes.

BACKGROUND OF THE INVENTION

[0002] Numerous technologies, such as autonomous vehicles, require an accurate estimation of the distances to an object in an environment, or from one object to other objects in an environment. Laser distance and ranging (LIDAR) and stereo imaging have been used for these purposes. However, each of these technologies suffers from drawbacks. While LIDAR is fast, easy to process, and capable of accurately measuring distances to objects that are relatively nearby, LIDAR-based detection is difficult for objects that are relatively far away. Meanwhile, stereo imaging has high accuracy, but is relatively slow and difficult to process. Accordingly, presented herein are systems and methods that fuse LIDAR and stereo technologies to advantageously use each technology while avoiding associated drawbacks in order to attain a more accurate estimation of the distances to an object in an environment, or from one object to other objects in an environment.

BRIEF DESCRIPTION OF THE DRAWINGS

[0003] Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings.

[0004] Figure 1 shows a schematic depicting an exemplary system for determining a distance to an object in an environment. [0005] Figure 2 shows a flowchart depicting an exemplary method for determining a distance to an object in an environment.

[0006] Figure 3 shows a block diagram of a computer system for determining a distance to an object in an environment.

[0007] Figure 4 A shows a first (left) camera image of an environment consisting of a single object of interest.

[0008] Figure 4B shows a second (right) camera image of the environment.

[0009] Figure 4C shows a LIDAR image of the environment.

[0010] Figure 4D shows LIDAR scan lines projected on the first camera image of Figure 4 A.

[0011] Figure 5 A shows a first (left) camera image of an environment consisting of multiple objects of interest.

[0012] Figure 5B shows a second (right) camera image of the environment.

[0013] Figure 5C shows a LIDAR image of the environment.

[0014] Figure 5D shows LIDAR scan lines projected on the first camera image of Figure 5A.

[0015] Figure 6A shows a first (left) camera image of four road signs placed at slightly different distances from a stereo imaging system.

[0016] Figure 6B shows a second (right) camera image of the four road signs.

[0017] Figure 6C shows the image of Figure 6A following application of a 6x superresolution procedure.

[0018] Figure 6D shows the image of Figure 6B following application of a 6x superresolution procedure.

[0019] Figure 7 shows the accuracy of distance measurements achieved using the super-resolution procedure compared with ground truth distance measurements obtained using a laser distance determination.

DETAILED DESCRIPTION

[0020] The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term “processor” refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions.

[0021] A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured.

[0022] As used herein, the term “or” shall convey both disjunctive and conjunctive meanings. For instance, the phrase “A or B” shall be interpreted to include element A alone, element B alone, and the combination of elements A and B.

[0023] As used herein, a resolution exceeds a value if the resolution is sufficient to resolve objects having a characteristic size equal to the value, as well as some objects having a characteristic size smaller than the value. Thus, for example, a resolution exceeding 1 meter (m) means that the resolution is sufficient to resolve objects having a characteristic size of 1 meter as well as some objects smaller than 1 m. For example, a resolution exceeding 1 meter may mean that the resolution is sufficient to resolve objects having a characteristic size of 0.9 m, 0.8 m, or less.

[0024] Numerous technologies, such as autonomous vehicles, require an accurate estimation of the distances from one object to other objects in its vicinity. Laser distance and ranging (LIDAR) and stereo imaging have been used for these purposes. However, each of these technologies suffers from drawbacks. While LIDAR is fast, easy to process, and capable of accurately measuring distances to objects that are relatively nearby, the number of LIDAR points associated with a given object decreases for objects that are relatively far away, making LIDAR-based detection of distant objects difficult. Meanwhile, stereo imaging has high accuracy, but is relatively slow and difficult to process.

[0025] By fusing LIDAR and stereo technologies, techniques described herein provide improved estimation of distances to objects over using either LIDAR or stereo data alone. In some embodiments, based on a camera image, an object of interest is detected in an environment. A LIDAR device is used to obtain LIDAR data of objects in the environment and a resolution of the LIDAR data in a vicinity of the detected object is determined. If the resolution of the LIDAR data exceeds a threshold value, a distance to the object is determined based on the LIDAR data. If the resolution of the LIDAR data does not exceed the threshold value, a distance to the object is determined based on stereo data in the vicinity of the detected object, which can be a stereo camera image obtained using two cameras. A superresolution procedure can be used to enhance the resolution of the stereo camera image.

[0026] As described herein, the problem of poor distance estimations obtained using LIDAR or stereo imaging alone is addressed by systems and methods that fuse LIDAR and stereo technologies to attain a more accurate estimation of a distance to an object in an environment. The systems and methods generally use a LIDAR device to obtain LIDAR data of objects in an environment, a camera to obtain a camera image of the environment, and two cameras to obtain a stereo camera image of the environment. In some embodiments, the object is detected in the environment based on a camera image obtained using a first camera. A resolution of the LIDAR data in a vicinity of the object is determined. If the resolution of the LIDAR data in the vicinity exceeds a threshold value, the distance to the object is determined based on the LIDAR data. In such case, the LIDAR data has sufficient resolution to accurately determine the distance to the object. If the resolution of the LIDAR data in the vicinity does not exceed the threshold value, a distance to the object is determined based on the stereo camera image. In such case, the LIDAR data has insufficient resolution to accurately determine the distance to the object and the stereo camera image is used to “fill in” the distance. In this manner, the LIDAR data is used to quickly and easily determine distances of objects for which there is sufficient LIDAR resolution, while the stereo camera image is used to determine distances only for those objects for which the LIDAR resolution is insufficient. By fusing LIDAR and stereo technologies, the disclosed techniques advantageously use each technology to provide more accurate distance estimations while avoiding the associated drawbacks. This allows for quick, easy, and accurate determination of the distances to multiple objects in the environment. In some cases, a super-resolution procedure is applied to enhance the resolution of the stereo camera image.

[0027] A system for determining a distance to an object in an environment is disclosed herein. The system generally comprises: a LIDAR device; a first camera; a second camera; a processor; and a memory coupled with the processor. The first camera and the second camera are configured to provide stereo data. The memory is configured to provide the processor with instructions that when executed cause the processor to: (a) obtain LIDAR data of the environment, wherein the LIDAR data is obtained from the LIDAR device; (b) obtain a camera image of the environment, wherein the camera image is obtained from the first camera; (c) detect the object in the environment based on the camera image; (d) determine a resolution of the LIDAR data in an area of the environment in which the detected object is located based on the camera image; (e) if the resolution of the LIDAR data in the area of the environment in which the detected object is located exceeds a threshold value, determine the distance to the detected object based on the LIDAR data; and (f) if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtain stereo data in the area of the environment in which the detected object is located, wherein the stereo data are obtained from the first camera and the second camera; and determine the distance to the detected object based on the stereo data in which the detected object is located. In some embodiments, the environment comprises a driving environment and the object comprises an obstacle in the driving environment. In some embodiments, the LIDAR data comprises a LIDAR scan. In some embodiments, the resolution of the LIDAR data in the area of the environment in which the detected object is located comprises a vertical resolution of the LIDAR data in the area of the environment in which the detected object is located. In some embodiments, the memory is further configured to provide the processor with instructions that when executed cause the processor to perform (c)-(f) for each detected object in the environment. In some embodiments, the memory is further configured to provide the processor with instructions that when executed cause the processor to: (g) perform a dense depth procedure of the area of the environment in which the detected object is located; and (h) determine the distance to the detected object based upon the dense depth procedure if the resolution of the LIDAR scan in the area of the environment in which the detected object is located is about equal to the threshold value. In some embodiments, the memory is further configured to provide the processor with instructions that when executed cause the processor to perform (c)-(h) for each detected object in the environment. In some embodiments, the determined distances are used to track detected objects in the environment. In some embodiments, the environment comprises a driving environment, and the determined distances are used by an autonomous vehicle to track detected objects in the environment during navigation. In some embodiments, (f) comprises: if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtain stereo data of the area of the environment in which the detected object is located; apply a super-resolution procedure to the stereo data of the area of the environment in which the detected object is located to obtain super-resolution stereo data; and determine the distance to the detected object based on the super-resolution stereo data.

[0028] Further disclosed herein is a method for determining a distance to an object in an environment. The method generally comprises: (a) obtaining LIDAR data of the environment, wherein the LIDAR data is obtained from a LIDAR device; (b) obtaining a camera image of the environment, wherein the camera image is obtained from a first camera; (c) detecting the object in the environment based on the camera image; (d) determining a resolution of the LIDAR data in an area of the environment in which the detected object is located based on the camera image; (e) if the resolution of the LIDAR data in the area of the environment in which the detected object is located exceeds a threshold value, determining the distance to the detected object based on the LIDAR data; and (f) if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining stereo data in the area of the environment in which the detected object is located, wherein the stereo data are obtained from the first camera and a second camera; and determining the distance to the detected object based on the stereo data in the area of the environment in which the detected object is located. In some embodiments, the environment comprises a driving environment and the object comprises an obstacle in the driving environment. In some embodiments, the LIDAR data comprises a LIDAR scan. In some embodiments, the resolution of the LIDAR data in the area of the environment in which the detected object is located comprises a vertical resolution of the LIDAR data in the area of the environment in which the detected object is located. In some embodiments, the method further comprises performing (c)-(f) for each detected object in the environment. In some embodiments, the method further comprises: (g) performing a dense depth procedure of the area of the environment in which the detected object is located; and (h) determining the distance to the detected object based upon the dense depth procedure if the resolution of the LIDAR scan in the area of the environment in which the detected object is located is about equal to the threshold value. In some embodiments, the method further comprises performing (c)-(h) for each detected object in the environment. In some embodiments, the method further comprises tracking detected objects in the environment using the determined distances. In some embodiments, the environment comprises a driving environment, and the method further comprises tracking detected objects using the determined distances by an autonomous vehicle during navigation. In some embodiments, (f) comprises: if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining stereo data of the environment in which the detected object is located; applying a super-resolution procedure to the stereo data of the environment in which the detected object is located to obtain super-resolution stereo data; and determining the distance to the detected object based on the super-resolution stereo data.

[0029] Further disclosed herein is a computer program product for determining a distance to an object in an environment. The computer program product is generally embodied in a tangible computer readable storage medium and comprises computer instructions for: (a) obtaining LIDAR data of the environment, wherein the LIDAR data is obtained from a LIDAR device; (b) obtaining a camera image of the environment, wherein the camera image is obtained from the first camera; (c) detecting the object in the environment based on the camera image; (d) determining a resolution of the LIDAR data in an area of the environment in which the detected object is located based on the camera image; (e) if the resolution of the LIDAR data in the area of the environment in which the detected object is located exceeds a threshold value, determining the distance to the detected object based on the LIDAR data; and (f) if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining stereo data of the environment in which the detected object is located, wherein the stereo data are obtained from the first camera and a second camera; and determining the distance to the detected object based on the stereo data. In some embodiments, the computer program product further comprises computer instructions for: (g) performing a dense depth procedure of the area of the environment in which the detected object is located; and (h) determining the distance to the detected object based upon the dense depth procedure if the resolution of the LIDAR scan in the area of the environment in which the detected object is located is about equal to the threshold value. In some embodiments, (f) comprises: if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value: obtaining stereo data of the area of the environment in which the detected object is located; applying a super-resolution procedure to the stereo data of the environment in which the detected object is located to obtain superresolution stereo data; and determining the distance to the detected object based on the super resolution-stereo data.

[0030] Figure 1 shows a schematic depicting an exemplary system 100 for determining a distance to an object in environment. In the example shown, the system comprises a LIDAR device 110. In some embodiments, the LIDAR device is configured to obtain LIDAR data. In some embodiments, the LIDAR device has a measurement range of at least about 10 meters (m), 20 m, 30 m, 40 m, 50 m, 60 m, 70 m, 80 m, 90 m, 100 m, 200 m, 300 m, 400 m, 500 m, 600 m, 700 m, 800 m, 900 m, 1,000 m, or more. In some embodiments, the LIDAR device has a measurement range of at most about 1,000 m, 900 m, 800 m, 700 m, 600 m, 500 m, 400 m, 300 m, 200 m, 100 m, 90 m, 80 m, 70 m, 60 m, 50 m, 40 m, 30 m, 20 m, 10 m, or less. In some embodiments, the LIDAR device has a measurement range that is within a range defined by any two of the preceding values.

[0031] In the example shown, the system comprises a first camera 120. In some embodiments, the first camera is configured to obtain or provide at least one camera image of the environment.

[0032] In the example shown, the system comprises a second camera 130. In some embodiments, the first and second cameras are configured to obtain or provide stereo data or stereo image data. The LIDAR device may be calibrated with the first camera or the second camera. The first camera or the second camera may be calibrated with the LIDAR device. The first camera may be calibrated with the second camera. The second camera may be calibrated with the first camera.

[0033] In the example shown, the system comprises a processor 140. In some embodiments, the processor comprises processing subsystem 301 described herein with respect to Figure 3.

[0034] In the example shown, the system comprises a memory 150. In some embodiments, the memory is coupled with the processor. In some embodiments, the memory comprises memory 304 described herein with respect to Figure 3. Thus, in some embodiments, the processor and the memory comprise components of computer system 300 described herein with respect to Figure 3. In some embodiments, the memory is configured to provide the processor with instructions. In some embodiments, the instructions, when executed, cause the processor to implement a method for determining a distance to an object in an environment. In some embodiments, the method comprises method 200 described herein with respect to Figure 2.

[0035] Figure 2 shows a flowchart depicting an exemplary method 200 for determining a distance to an object in an environment. In the example shown, LIDAR data of the environment is obtained or provided at 210. In some embodiments, the LIDAR data is obtained from a LIDAR device. In some embodiments, the LIDAR device is LIDAR device 110 described herein with respect to Figure 1. In some embodiments, the environment comprises a driving environment. In some embodiments, the object comprises an obstacle in the driving environment. In some embodiments, the LIDAR data comprises a LIDAR scan.

[0036] At 220, at least one camera image of the environment is obtained or provided. In some embodiments, the camera image is obtained from a first camera. In some embodiments, the first camera is first camera 120 described herein with respect to Figure 1.

[0037] At 230, the object in the environment is detected based on the camera image. In some embodiments, at least one object detection procedure is applied to the camera image to detect the object in the environment. In some embodiments, the at least one object detection procedure comprises an artificial intelligence (Al) procedure, a machine learning (ML) procedure, a deep learning (DL) procedure, a very DL procedure, a neural network (NN) procedure, a deep NN procedure, a convolutional NN (CNN) procedure, a deep CNN procedure, a very deep CNN procedure, a fully convolutional CNN (FCN) procedure, a region-based CNN (R-CNN) procedure, a region -based FCN (R-FCN) procedure, a recurrent NN (RNN) procedure, or a deep RNN procedure.

[0038] At 240, the resolution of the LIDAR data in an area of the environment in which the detected object is located is determined based on the camera image. In some embodiments, the resolution of the LIDAR data in the area of the environment in which the detected object is located comprises a vertical resolution of the LIDAR data in the area of the environment in which the detected object is located.

[0039] At 250, if the resolution of the LIDAR data in the area of the environment in which the detected object is located exceeds a threshold value, the distance to the detected object is determined based on the LIDAR data. In some embodiments, the threshold value is at least about 1 millimeter (mm), 2 mm, 3 mm, 4 mm, 5 mm, 6 mm, 7 mm, 8 mm, 9 mm, 10 mm, 20 mm, 30 mm, 40 mm, 50 mm, 60 mm, 70 mm, 80 mm, 90 mm, 100 mm, 200 mm, 300 mm, 400 mm, 500 mm, 600 mm, 700 mm, 800 mm, 900 mm, 1 m, or more. In some embodiments, the threshold value is at most about 1 m, 900 mm, 800 mm, 700 mm, 600 mm, 500 mm, 400 mm, 300 mm, 200 mm, 100 mm, 90 mm, 80 mm, 70 mm, 60 mm, 50 mm, 40 mm, 30 mm, 20 mm, 10 mm, 9 mm, 8 mm, 7 mm, 6 mm, 5 mm, 4 mm, 3 mm, 2 mm, 1 mm, or less. In some embodiments, the resolution of the LIDAR data comprises a vertical resolution of the LIDAR data.

[0040] At 260, if the resolution of the LIDAR data in the area of the environment in which the detected object is located does not exceed the threshold value, stereo data in the area of the environment in which the detected object is located is obtained and the distance to the detected object is determined based on the stereo data in the area of the environment in which the detected object is located. In some embodiments, the stereo data is rectified using stereo calibration parameters to form rectified stereo data. In some embodiments, the detected object is mapped to corresponding pixel coordinates in the rectified stereo data. In some embodiments, the detected object is mapped to the corresponding pixel coordinates using a matrix transformation representing the stereo calibration parameters. In some embodiments, the rectified stereo data is cropped around the corresponding pixel coordinates to form cropped rectified stereo data. In some embodiments, the distance to the detected object is determined based on the cropped rectified stereo data. In some embodiments, a histogram is used to filter out noisy or invalid distances.

[0041] In some embodiments, operation 260 comprises: obtaining the stereo data in the area of the environment in which the detected object is located, applying a super resolution procedure to the stereo data of the environment in which the detected object is located to obtain super-resolution stereo data, and determining the distance to the detected object based on the super-resolution stereo data. In some embodiments, the super-resolution procedure comprises a multiple frame super resolution procedure, a single image super resolution procedure, a neighbor embedding procedure, a sparse representation procedure, a self-exemplars procedure, a transformed self-exemplars procedure, a local regression procedure, a local linear regression procedure, an anchored neighborhood regression procedure, a simple function procedure, an adjusted anchored neighborhood regression procedure, a super-resolution forest procedure, a naive Bayes super-resolution forest procedure, a manifold span reduction procedure, a ML procedure, a DL procedure, a very DL procedure, a NN procedure, a deep NN procedure, a CNN procedure, a deep CNN procedure, a very deep CNN procedure, a Shepard CNN procedure, a generative adversarial network procedure, a Laplacian pyramid network procedure, a deep Laplacian pyramid network procedure, a RNN procedure, a deep RNN procedure, an anchored regression network procedure, a persistent memory procedure, a dense skip connection procedure, a spatial feature modulation procedure, a dual-state RNN procedure, an information distillation network procedure, a sparse Dirichlet-net procedure, a dynamic upsampling filter procedure, a residual channel attention network procedure, a deep residual channel attention network procedure, a feature discrimination procedure, or a stochastic frequency masking procedure

[0042] In some embodiments, a dense depth procedure of the area of the environment in which the detected object is located is performed. In some embodiments, the dense depth procedure comprises copying LIDAR distances from the area of the environment in which the detected object is located to neighboring pixels in the LIDAR scan. In some embodiments, the dense depth procedure comprises applying a guided filter to align the LIDAR scan with edges in the stereo data.

[0043] In some embodiments, the distance to the detected object is determined based upon the dense depth procedure if the resolution of the LIDAR scan in the area of the environment in which the detected object is located is about equal to the threshold value. [0044] In some embodiments, any one, two, three, or four of operations 230, 240, 250, and 260 are performed for each detected object in the environment. In some embodiments, any one, two, three, or four of operations 230, 240, 250, and 260 are performed at least about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, or more times. In some embodiments, any one, two, three, or four of operations 230, 240, 250, and 260 are performed at most about 1,000, 900, 800, 700, 600, 500, 400, 300, 200, 100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1 times. In some embodiments, any one, two, three, or four of operations 230, 240, 250, and 260 are performed a number of times that is within a range defined by any two of the preceding values.

[0045] In some embodiments, the method 200, or any one, two, three, four, five, or six of operations 210, 220, 230, 240, 250, and 260 is repeated at least about every 1 millisecond (ms), 2 ms, 3 ms, 4 ms, 5 ms, 6 ms, 7 ms, 8 ms, 9 ms, 10 ms, 20 ms, 30 ms, 40 ms, 50 ms, 60 ms, 70 ms, 80 ms, 90 ms, 100 ms, 200 ms, 300 ms, 400 ms, 500 ms, 600 ms, 700 ms, 800 ms, 900 ms, 1,000 ms, or more. In some embodiments, the method 200, or any one, two, three, four, five, or six of operations 210, 220, 230, 240, 250, and 260, is repeated at most about every 1,000 ms, 900 ms, 800 ms, 700 ms, 600 ms, 500 ms, 400 ms, 300 ms, 200 ms, 100 ms, 90 ms, 80 ms, 70 ms, 60 ms, 50 ms, 40 ms, 30 ms, 20 ms, 10 ms, 9 ms, 8 ms, 7 ms, 6 ms, 5 ms, 4 ms, 3 ms, 2 ms, 1 ms, or less. In some embodiments, the method 200, or any one, two, three, four, five, or six of operations 210, 220, 230, 240, 250, and 260, is repeated within a period of time defined by any two of the preceding values.

[0046] In some embodiments, the method 200 further comprises tracking detected objects in the environment using the determined distances. In some embodiments, the method further comprises tracking detected objects using the determined distances by an autonomous vehicle during navigation of the autonomous vehicle in the environment.

[0047] Additionally, systems are disclosed that can be used to perform the method 200 of Figure 2, or any of operations 210, 220, 230, 240, 250, and 260. In some embodiments, the systems comprise one or more processors and memory coupled to the one or more processors. In some embodiments, the one or more processors are configured to implement one or more operations of method 200. In some embodiments, the memory is configured to provide the one or more processors with instructions corresponding to the operations of method 200. In some embodiments, the instructions are embodied in a tangible computer readable storage medium.

[0048] Figure 3 is a block diagram of a computer system 300 used in some embodiments to perform all or portions of methods for determining a distance to an object in an environment described herein (such as operations 210, 220, 230, 240, 250, or 260 of method 200 as described herein with respect to Figure 2). In some embodiments, the computer system may be utilized as a component in systems for determining a distance to an object in an environment herein. Figure 3 illustrates one embodiment of a general purpose computer system. Other computer system architectures and configurations can be used for carrying out the processing of the present invention. Computer system 300, made up of various subsystems described below, includes at least one microprocessor subsystem 301. In some embodiments, the microprocessor subsystem comprises at least one central processing unit (CPU) or graphical processing unit (GPU). The microprocessor subsystem can be implemented by a single-chip processor or by multiple processors. In some embodiments, the microprocessor subsystem is a general purpose digital processor which controls the operation of the computer system 300. Using instructions retrieved from memory 304, the microprocessor subsystem controls the reception and manipulation of input data, and the output and display of data on output devices.

[0049] The microprocessor subsystem 301 is coupled bi-directionally with memory 304, which can include a first primary storage, typically a random access memory (RAM), and a second primary storage area, typically a read-only memory (ROM). As is well known in the art, primary storage can be used as a general storage area and as scratch-pad memory, and can also be used to store input data and processed data. It can also store programming instructions and data, in the form of data objects and text objects, in addition to other data and instructions for processes operating on microprocessor subsystem. Also as well known in the art, primary storage typically includes basic operating instructions, program code, data and objects used by the microprocessor subsystem to perform its functions. Primary storage devices 304 may include any suitable computer-readable storage media, described below, depending on whether, for example, data access needs to be bidirectional or unidirectional. The microprocessor subsystem 301 can also directly and very rapidly retrieve and store frequently needed data in a cache memory (not shown).

[0050] A removable mass storage device 305 provides additional data storage capacity for the computer system 300, and is coupled either bidirectionally (read/write) or unidirectionally (read only) to microprocessor subsystem 301. Storage 305 may also include computer-readable media such as magnetic tape, flash memory, signals embodied on a carrier wave, PC-CARDS, portable mass storage devices, holographic storage devices, and other storage devices. A fixed mass storage 309 can also provide additional data storage capacity. The most common example of mass storage 309 is a hard disk drive. Mass storage 305 and 309 generally store additional programming instructions, data, and the like that typically are not in active use by the processing subsystem. It will be appreciated that the information retained within mass storage 305 and 309 may be incorporated, if needed, in standard fashion as part of primary storage 304 (e.g. RAM) as virtual memory.

[0051] In addition to providing processing subsystem 301 access to storage subsystems, bus 306 can be used to provide access other subsystems and devices as well. In the described embodiment, these can include a display monitor 308, a network interface 307, a keyboard 302, and a pointing device 303, as well as an auxiliary input/output device interface, a sound card, speakers, and other subsystems as needed. The pointing device 303 may be a mouse, stylus, track ball, or tablet, and is useful for interacting with a graphical user interface.

[0052] The network interface 307 allows the processing subsystem 301 to be coupled to another computer, computer network, or telecommunications network using a network connection as shown. Through the network interface 307, it is contemplated that the processing subsystem 301 might receive information, e.g., data objects or program instructions, from another network, or might output information to another network in the course of performing the above-described method steps. Information, often represented as a sequence of instructions to be executed on a processing subsystem, may be received from and outputted to another network, for example, in the form of a computer data signal embodied in a carrier wave. An interface card or similar device and appropriate software implemented by processing subsystem 301 can be used to connect the computer system 300 to an external network and transfer data according to standard protocols. That is, method embodiments of the present invention may execute solely upon processing subsystem 301, or may be performed across a network such as the Internet, intranet networks, or local area networks, in conjunction with a remote processing subsystem that shares a portion of the processing. Additional mass storage devices (not shown) may also be connected to processing subsystem 301 through network interface 307. [0053] An auxiliary I/O device interface (not shown) can be used in conjunction with computer system 300. The auxiliary I/O device interface can include general and customized interfaces that allow the processing subsystem 301 to send and, more typically, receive data from other devices such as microphones, touch-sensitive displays, transducer card readers, tape readers, voice or handwriting recognizers, biometrics readers, cameras, portable mass storage devices, and other computers.

[0054] In addition, embodiments of the present invention further relate to computer storage products with a computer readable medium that contains program code for performing various computer-implemented operations. The computer-readable medium is any data storage device that can store data which can thereafter be read by a computer system. The media and program code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well known to those of ordinary skill in the computer software arts. Examples of computer-readable media include, but are not limited to, all the media mentioned above: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROM disks; magneto-optical media such as floptical disks; and specially configured hardware devices such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs), and ROM and RAM devices. The computer-readable medium can also be distributed as a data signal embodied in a carrier wave over a network of coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion. Examples of program code include both machine code, as produced, for example, by a compiler, or files containing higher level code that may be executed using an interpreter. The computer system shown in Figure 3 is but an example of a computer system suitable for use with the invention. Other computer systems suitable for use with the invention may include additional or fewer subsystems. In addition, bus 306 is illustrative of any interconnection scheme serving to link the subsystems. Other computer architectures having different configurations of subsystems may also be utilized.

EXAMPLES

Example 1: Sufficient LIDAR Resolution

[0055] A homebuilt system featuring both LIDAR and stereo imaging capabilities was constructed and calibrated. LIDAR and stereo images of an environment consisting of a single object of interest were obtained. Figure 4A shows a first (left) camera image of the environment. Figure 4B shows a second (right) camera image of the environment. Figure 4C shows a LIDAR image of the environment.

[0056] Figure 4D shows LIDAR scan lines projected on the first camera image of Figure 4A. As shown in Figure 4D, the object of interest (the car in the foreground) was subtended by as many as five LIDAR scan lines along the vertical dimension. Thus, the LIDAR data had sufficient vertical resolution to accurately determine the distance to the object of interest and the LIDAR data could be used directed to accurately determine this distance. The LIDAR data was used to determine that the distance to the object of interest was approximately 8.37 m.

Example 2: Insufficient LIDAR Resolution

[0057] The homebuilt system was used to obtain LIDAR and stereo images of an environment consisting of multiple objects of interest. Figure 5 A shows a first (left) camera image of the environment. Figure 5B shows a second (right) camera image of the environment. Figure 5C shows a LIDAR image of the environment.

[0058] Figure 5D shows LIDAR scan lines projected on the first camera image of Figure 5A. As shown in Figure 5D, a first object of interest (the car in the middle of the image) was subtended by as many as three LIDAR scan lines along the vertical dimension. Thus, the LIDAR data had vertical resolution that was only borderline in its ability to accurately determine the distance to the first object of interest. To correct for this issue, a dense depth procedure was performed in the area near the first object of interest and the distance to the first object of interest was determined based on the dense depth procedure. The dense depth procedure was used to determine that the distance to the first object of interest was approximately 17.14 m.

[0059] A second object of interest (the sport utility vehicle in the upper right of the image) was subtended by only one or two LIDAR scan lines along the vertical dimension. Thus, the LIDAR data had vertical resolution that was insufficient to accurately determine the distance to the second object of interest. To correct for this issue, the first and second camera images were fused to form stereo data in the vicinity of the second object of interest and the distance to the second object of interest was determined from the stereo data. The stereo data was used to determine that the distance to the second object of interest was approximately 47.84 m.

[0060] Third, fourth, and fifth objects of interest (the cars further down the road from the first object of interest in the image, labeled from left to right in the image) were subtended by zero or one LIDAR scan lines along the vertical dimension. Thus, the LIDAR data had vertical resolution that was insufficient to accurately determine the distances to the third, fourth, and fifth objects of interest. To correct for this issue, the first and second camera images were fused to form stereo data in the vicinity of the third, fourth, and fifth objects of interest. A super-resolution procedure was applied to account for the large distances to the third, fourth, and fifth objects of interest to form super-resolution stereo data. The distances to the third, fourth, and fifth objects of interest were determined from the super-resolution stereo data. The super-resolution stereo data was used to determine that the distances to the third, fourth, and fifth objects of interest were approximately 508.39 m, 744.20 m, and 76.93 m, respectively.

Example 3: Super-Resolution Stereo Imaging

[0061] Super-resolution stereo imaging procedures were evaluated for their ability to distinguish between objects placed at different distances away from a homebuilt stereo imaging system. The super-resolution procedures were applied to more than 200 images at different super-resolution enhancement scales. Bicubic, sparse representation, self-exemplars, and local linear regression procedures were applied to the images. The procedures were assessed based on the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), information fidelity criteria (IFC), and time (in seconds) required to apply the procedures to a single frame. The results of the measurements are shown in Table 1.

Table 1 : Performance of super-resolution stereo imaging procedures on a variety of images

[0062] A good super-resolution procedure should ideally display a combination of high PSNR, high SSIM, high IFC, and low time. Thus, the best performing procedure on this sample set was local linear regression.

[0063] A homebuilt system featuring stereo imaging capabilities was constructed and calibrated. Four road signs were placed at slightly different distances from the system. The four signs were nearly overlapping in the horizontal direction, such that stereo imaging without the use of super-resolution techniques could not be expected to differentiate between the four road signs. Figure 6A shows a first (left) camera image of the four road signs. Figure 6B shows a second (right) camera image of the four road signs. The camera images shown in Figures 6A and 6B were insufficient to differentiate between the four road signs and accurate depths therefore could not be determined.

[0064] A 6x super-resolution procedure was then applied to the camera images of Figures 6A and 6B. Figure 6C shows the image of Figure 6A following application of a 6x super-resolution procedure. Figure 6D shows the image of Figure 6B following application of a 6x super-resolution procedure. The super-resolution camera images shown in Figures 6C and 6D were sufficient to differentiate between the four road signs and accurate depths therefore could be determined for each of the four road signs.

[0065] Figure 7 shows the accuracy of distance measurements achieved using the super-resolution procedure compared with ground truth distance measurements obtained using a laser distance determination. As shown in Figure 7, the super-resolution procedure generally obtained accurate distance measurements at a variety of distances.