Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SAFELY CONTROLLING AN AUTONOMOUS ENTITY IN PRESENCE OF INTELLIGENT AGENTS
Document Type and Number:
WIPO Patent Application WO/2017/197170
Kind Code:
A1
Abstract:
A controller controls motion of an autonomous entity. The controller determines a baseline trajectory for the autonomous entity that achieves a particular goal (e.g., moves the autonomous entity to a particular location). A safety controller then modifies the baseline trajectory to avoid other entities (e.g., humans or other entities) in the vicinity of the autonomous entity based on a prediction motion of those entities. The controller may be applied to, for example, a robot co-working in a factory with humans or an autonomous vehicle moving around other vehicles.

More Like This:
Inventors:
LIU CHANGLIU (US)
TOMIZUKA MASAYOSHI (US)
Application Number:
PCT/US2017/032243
Publication Date:
November 16, 2017
Filing Date:
May 11, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
UNIV CALIFORNIA (US)
International Classes:
B25J9/10; A61B34/30; B25J9/16; B25J13/08; B25J19/02; B25J19/06; G05B19/42
Domestic Patent References:
WO2011146259A22011-11-24
WO2015185738A22015-12-10
Foreign References:
US20130345718A12013-12-26
US20150239124A12015-08-27
US20110184555A12011-07-28
US20030225479A12003-12-04
Attorney, Agent or Firm:
AMSEL, Jason, E. et al. (US)
Download PDF:
Claims:
CLAIMS

1. A method for controlling an autonomous entity, the method comprising:

estimating, by a state estimator, a state of the autonomous entity based on first sensor measurements;

predicting, by a human motion predictor, a motion of one or more other entities based on second sensor measurements and a motion model for the one or more other entities;

determining, by a baseline controller, a baseline control signal for controlling the autonomous entity to complete a task based on the estimated state of the autonomous entity and an input goal, the baseline control signal representing a baseline trajectory of the autonomous entity that enables the autonomous entity to complete the task in accordance with an optimization criterion; determining, by a safety controller, a safety control signal for controlling the

autonomous entity based on the baseline control signal, the predicted motion of the one or more other entities, and the estimated state of the autonomous entity, the safety control signal representing a modification to the baseline trajectory of the autonomous entity determined by the baseline controller to cause the autonomous entity to deviate from the baseline trajectory to meet a safety constraint with respect to predicted motion of the one or more other entities;

combining the baseline control signal and the safety control signal to generate a combined control signal representing a safe trajectory of the autonomous entity that meets the safety constraint and enables the autonomous entity to complete the task; and controlling motion of the autonomous entity to follow the safe trajectory according to the combined control signal.

2. The method of claim 1, wherein determining, by the safety controller, the safety control signal comprises:

generating, by a safety controller, a safety index based on the estimated state of the autonomous entity and the predicted human motion, the safety index indicating whether or not the estimated state of the autonomous entity is within a range of safe states given the predicted motion of the one or more other entities, and indicating a distance between the estimated state of the autonomous entity and a boundary of the range of safe states given the predicted motion of the one or more other entities;

generated by a criteria controller, a criteria signal based on the baseline control signal and the safety index, the criteria signal indicating whether the safety index indicates that the estimated state of the autonomous entity is within the range of safe states with respect to the predicted motion of the one or more other entities or whether the safety index indicates that the estimated state of the autonomous entity is outside the range of safe states with respect to the predicted motion of the one or more other entities; and

generating, by a control modification controller, a modification signal based on the baseline control signal, the estimated state of the autonomous entity, and the predicted motion of the one or more other entities, the modification signal representing an optimal change in trajectory relative to the baseline trajectory to put the autonomous entity in the range of safe states relative to the predicted motion of the one or more other entities; combining the criteria signal determined by the safety index and the modification signal to generate the safety control signal.

3. The method of claim 2, wherein generating the safety index comprises:

modeling the autonomous entity and the one or more other entities geometric

capsules; and

determining the distance between the geometric capsules.

4. The method of claim 2, wherein generating the modification signal comprises:

causing a change in the state of the autonomous entity that results in a decrease in the safety index.

5. The method of claim 1, wherein determining, the baseline control signal is further based on the predicted motion of the one or more other entities.

6. The method of claim 1, wherein predicting the motion of the one or more other entities comprises:

receiving a plurality of motion models learned from an offline classifier;

updating parameters of the plurality of motion models based on observed motion of the one or more other entities; and

generating the prediction from the plurality of motion models with the updated

parameters.

7. The method of claim 1, wherein predicting the motion of the one or more other entities comprises:

modeling a state of each of the one or more other entities as a plurality of features; and

applying a classifier to the plurality of features to determine probabilities of each of the one or more other entities taking different actions from a predefined set of possible actions; and predicting the motion based on the determined probabilities.

8. The method of claim 1, wherein the autonomous entity comprises a robot.

9. The method of claim 1, wherein the autonomous entity comprises an autonomously- controlled vehicle.

10. The method of claim 1 , wherein the one or more other entities comprise one or more humans.

1 1. A non-transitory computer-readable storage medium storing instructions for controlling an autonomous entity, the instructions when executed by a processor causing the processor to perform step including:

estimating, by a state estimator, a state of the autonomous entity based on first sensor measurements;

predicting, by a human motion predictor, a motion of one or more other entities based on second sensor measurements and a motion model for the one or more other entities;

determining, by a baseline controller, a baseline control signal for controlling the autonomous entity to complete a task based on the estimated state of the autonomous entity and an input goal, the baseline control signal representing a baseline trajectory of the autonomous entity that enables the autonomous entity to complete the task in accordance with an optimization criterion; determining, by a safety controller, a safety control signal for controlling the

autonomous entity based on the baseline control signal, the predicted motion of the one or more other entities, and the estimated state of the autonomous entity, the safety control signal representing a modification to the baseline trajectory of the autonomous entity determined by the baseline controller to cause the autonomous entity to deviate from the baseline trajectory to meet a safety constraint with respect to predicted motion of the one or more other entities;

combining the baseline control signal and the safety control signal to generate a

combined control signal representing a safe trajectory of the autonomous entity that meets the safety constraint and enables the autonomous entity to complete the task; and

controlling motion of the autonomous entity to follow the safe trajectory according to the combined control signal.

The non-transitory computer-readable storage medium of claim 11, wherein determining, by the safety controller, the safety control signal comprises:

generating, by a safety controller, a safety index based on the estimated state of the autonomous entity and the predicted human motion, the safety index indicating whether or not the estimated state of the autonomous entity is within a range of safe states given the predicted motion of the one or more other entities, and indicating a distance between the estimated state of the autonomous entity and a boundary of the range of safe states given the predicted motion of the one or more other entities;

generated by a criteria controller, a criteria signal based on the baseline control signal and the safety index, the criteria signal indicating whether the safety index indicates that the estimated state of the autonomous entity is within the range of safe states with respect to the predicted motion of the one or more other entities or whether the safety index indicates that the estimated state of the autonomous entity is outside the range of safe states with respect to the predicted motion of the one or more other entities; and generating, by a control modification controller, a modification signal based on the baseline control signal, the estimated state of the autonomous entity, and the predicted motion of the one or more other entities, the modification signal representing an optimal change in trajectory relative to the baseline trajectory to put the autonomous entity in the range of safe states relative to the predicted motion of the one or more other entities;

combining the criteria signal determined by the safety index and the modification signal to generate the safety control signal.

13. The non-transitory computer-readable storage medium of claim 12, wherein generating the safety index comprises:

modeling the autonomous entity and the one or more other entities geometric

capsules; and

determining the distance between the geometric capsules.

14. The non-transitory computer-readable storage medium of claim 12, wherein generating the modification signal comprises:

causing a change in the state of the autonomous entity that results in a decrease in the safety index.

15. The non-transitory computer-readable storage medium of claim 1 1, wherein determining, the baseline control signal is further based on the predicted motion of the one or more other entities.

16. The non-transitory computer-readable storage medium of claim 1 1, wherein predicting the motion of the one or more other entities comprises:

receiving a plurality of motion models learned from an offline classifier;

updating parameters of the plurality of motion models based on observed motion of the one or more other entities; and generating the prediction from the plurality of motion models with the updated parameters.

17. The non-transitory computer-readable storage medium of claim 1 1, wherein predicting the motion of the one or more other entities comprises:

modeling a state of each of the one or more other entities as a plurality of features; and

applying a classifier to the plurality of features to determine probabilities of each of the one or more other entities taking different actions from a predefined set of possible actions; and

predicting the motion based on the determined probabilities.

18. The non-transitory computer-readable storage medium of claim 1 1, wherein the autonomous entity comprises a robot.

19. The non-transitory computer-readable storage medium of claim 11, wherein the

autonomous entity comprises an autonomously-controlled vehicle.

20. The non-transitory computer-readable storage medium of claim 1 1, wherein the one or more other entities comprise one or more humans.

21. A controller device for controlling an autonomous entity, the controller device

comprising:

a processor; and

a non-transitory computer-readable storage medium storing instructions for

controlling an autonomous entity, the instructions when executed by the processor causing the processor to perform step including:

estimating, by a state estimator, a state of the autonomous entity based on first sensor measurements; predicting, by a human motion predictor, a motion of one or more other entities based on second sensor measurements and a motion model for the one or more other entities;

determining, by a baseline controller, a baseline control signal for controlling the autonomous entity to complete a task based on the estimated state of the autonomous entity and an input goal, the baseline control signal representing a baseline trajectory of the autonomous entity that enables the autonomous entity to complete the task in accordance with an optimization criterion;

determining, by a safety controller, a safety control signal for controlling the autonomous entity based on the baseline control signal, the predicted motion of the one or more other entities, and the estimated state of the autonomous entity, the safety control signal representing a modification to the baseline trajectory of the autonomous entity determined by the baseline controller to cause the autonomous entity to deviate from the baseline trajectory to meet a safety constraint with respect to predicted motion of the one or more other entities;

combining the baseline control signal and the safety control signal to generate a combined control signal representing a safe trajectory of the autonomous entity that meets the safety constraint and enables the autonomous entity to complete the task; and

controlling motion of the autonomous entity to follow the safe trajectory according to the combined control signal.

The controller device of claim 21, wherein determining, by the safety controller, the safety control signal comprises: generating, by a safety controller, a safety index based on the estimated state of the autonomous entity and the predicted human motion, the safety index indicating whether or not the estimated state of the autonomous entity is within a range of safe states given the predicted motion of the one or more other entities, and indicating a distance between the estimated state of the autonomous entity and a boundary of the range of safe states given the predicted motion of the one or more other entities;

generated by a criteria controller, a criteria signal based on the baseline control signal and the safety index, the criteria signal indicating whether the safety index indicates that the estimated state of the autonomous entity is within the range of safe states with respect to the predicted motion of the one or more other entities or whether the safety index indicates that the estimated state of the autonomous entity is outside the range of safe states with respect to the predicted motion of the one or more other entities; and

generating, by a control modification controller, a modification signal based on the baseline control signal, the estimated state of the autonomous entity, and the predicted motion of the one or more other entities, the modification signal representing an optimal change in trajectory relative to the baseline trajectory to put the autonomous entity in the range of safe states relative to the predicted motion of the one or more other entities;

combining the criteria signal determined by the safety index and the modification signal to generate the safety control signal.

The controller device of claim 22, wherein generating the safety index comprises:

modeling the autonomous entity and the one or more other entities geometric

capsules; and determining the distance between the geometric capsules.

24. The controller device of claim 22, wherein generating the modification signal comprises: causing a change in the state of the autonomous entity that results in a decrease in the safety index.

25. The controller device of claim 21, wherein determining, the baseline control signal is further based on the predicted motion of the one or more other entities.

26. The controller device of claim 21, wherein predicting the motion of the one or more other entities comprises:

receiving a plurality of motion models learned from an offline classifier;

updating parameters of the plurality of motion models based on observed motion of the one or more other entities; and

generating the prediction from the plurality of motion models with the updated

parameters.

27. The controller device of claim 21, wherein predicting the motion of the one or more other entities comprises:

modeling a state of each of the one or more other entities as a plurality of features; and

applying a classifier to the plurality of features to determine probabilities of each of the one or more other entities taking different actions from a predefined set of possible actions; and

predicting the motion based on the determined probabilities.

28. The controller device of claim 21, wherein the autonomous entity comprises a robot.

29. The controller device of claim 21, wherein the autonomous entity comprises an

autonomously-controlled vehicle. The controller device of claim 21, wherein the one or more other entities comprise more humans.

Description:
SAFELY CONTROLLING AN AUTONOMOUS ENTITY IN PRESENCE OF

INTELLIGENT AGENTS

BACKGROUND

[0001] This application relates generally to computer-controlled entities and more specifically to a controller for safely controlling an autonomous entity.

[0002] Control of autonomous entities such as robots and vehicles is a challenging problem. In environments in which humans, human-controlled objects, or other intelligent agents are present, a particularly difficult challenge exists in controlling the trajectory of the autonomous entity to achieve a particular goal while operating safely to avoid colliding with or otherwise interfering with the other moving entities.

[0003] For example, in modern factories, human workers and robots are two major workforces. For safety concerns, the two are normally separated with robots confined in metal cages, which limits the productivity as well as the flexibility of production lines. In recent years, attention has been directed to removing the cages so that human workers and robots may collaborate to create a human-robot co-existing factory. Those robots working in a human-involved environment are called co-robots.

[0004] The potential benefits of co-robots are huge and extensive. For example, they may be placed in human-robot teams in flexible production lines, where robot arms and human workers cooperate in handling work pieces, and automated guided vehicles (AGV) co-inhabit with human workers to facilitate factory logistics.

[0005] In the factories of the future, more and more interactions among humans and industrial robots are anticipated to take place. In such environments, safety is one of the biggest concerns. However, conventional approaches are focused only on intrinsic safety, i.e. safety in mechanical design, actuation, and low level motion control. Conventional approaches to industrial robot safety do not focus on safety during interactions with humans.

[0006] Similarly, when automated vehicles are driven on public roads, safety is a big concern. While existing technologies can assure high-fidelity sensing, real-time computation and robust control, the challenges lie in the interactions between the automated vehicle and the environment which includes other manually driven vehicles. Given sensory information from multiple sensors (e.g. cameras and LIDAR), an automated vehicle should be intelligent enough to find a safe and efficient trajectory to its destination, which takes into account of the complex environment with multiple surrounding vehicles. [0007] Conservative strategies such as "braking when collision is anticipated," known as the Automatic Emergency Braking (AEB) function in existing models, are not the best actions in most cases (although they may be necessary in certain cases). Taking into account the dynamics and future course of surrounding vehicles, the automated vehicle should decide between multiple choices for a safe maneuver.

[0008] In each of the factory robot scenario, autonomous vehicle scenario, and other similar scenarios in which an autonomous entity is operating in an environment with humans or other moving objects, it is desirable to ensure that the autonomous entity can operate to achieve its goals without jeopardizing safety of other entities.

SUMMARY

[0009] A method, non-transitory computer-readable storage medium, and controller device controls an autonomous entity. A state estimator estimates a state of the autonomous entity based on first sensor measurements. A motion predictor predicts a motion of one or more other entities based on second sensor measurements and a motion model for the one or more other entities. A baseline controller estimates a baseline control signal for controlling the autonomous entity to complete a task based on the estimated state of the autonomous entity and an input goal, the baseline control signal representing a baseline trajectory of the autonomous entity that enables the autonomous entity to complete the task in accordance with an optimization criterion. In one embodiment, an enhanced baseline controller, (also referred to herein as the "efficiency controller,") is further based on the long-term predicted motion of the one or more other entities predicted motion of the one or more other entities in the estimation of a baseline control signal. A safety controller generates a safety control signal for controlling the autonomous entity based on the baseline control signal, the predicted motion of the one or more other entities, and the estimated state of the autonomous entity. The safety control signal represents a modification to the baseline trajectory of the autonomous entity determined by the baseline controller to cause the autonomous entity to deviate from the baseline trajectory to meet a safety constraint with respect to predicted motion of the one or more other entities. The baseline control signal and the safety control signal are combined to generate a combined control signal representing a safe trajectory of the autonomous entity that meets the safety constraint and enables the autonomous entity to complete the task. Motion of the autonomous entity is controlled to follow the safe trajectory according to the combined control signal. [0010] In an embodiment, the safety controller generates a safety index based on the estimated state of the autonomous entity and the predicted motion of one or more other entities. The safety index indicates whether or not the estimated state of the autonomous entity is within a range of safe states given the predicted motion of the one or more other entities, and indicating a distance between the estimated state of the autonomous entity and a boundary of the range of safe states given the predicted motion of the one or more other entities. For example, in one embodiment, the autonomous entity and the one or more other entities are modeled as geometric capsules, and the safety index is generated by determining the distance between the geometric capsules.

[0011] A criteria controller generates a criteria signal based on the baseline control signal and the safety index. The criteria signal indicates whether the safety index indicates that the estimated state of the autonomous entity is within the range of safe states with respect to the predicted motion of the one or more other entities or whether the safety index indicates that the estimated state of the autonomous entity is outside the range of safe states with respect to the predicted motion of the one or more other entities. A control modification controller generates a modification signal based on the baseline control signal, the estimated state of the autonomous entity, and the predicted motion of the one or more other entities. The modification signal represents an optimal change in trajectory relative to the baseline trajectory to put the autonomous entity in the range of safe states relative to the predicted motion of the one or more other entities. For example, in one embodiment, the modification signal is generated by causing a change in the state of the autonomous entity that results in a decrease in the safety index.

[0012] The criteria signal and the modification signal are combined to generate the safety control signal.

[0013] In an embodiment, the motion predictor receives a plurality of motion models learned from an offline classifier. The motion predictor then updates parameters of the plurality of motion models based on observed motion of the one or more other entities, and generates the prediction from the plurality of motion models with the updated parameters.

[0014] In another embodiment, the motion predictor models a state of each of the one or more other entities as a plurality of features. The motion predictor applies a classifier to the plurality of features to determine probabilities of each of the one or more other entities taking different actions from a predefined set of possible actions. The motion predictor then predicts the motion based on the determined probabilities. BRIEF DESCRIPTION OF THE DRAWINGS

[0015] FIG. 1 is a high-level block diagram of an embodiment of a controller for controlling an autonomous entity.

[0016] FIG. 2 is a detailed block diagram of a first embodiment of control module for controlling an autonomous entity.

[0017] FIG. 3 is a diagram illustrating an example of a control outcome achieved by the first embodiment of the controller.

[0018] FIG. 4 is a detailed block diagram of a second embodiment of control module for controlling an autonomous entity.

[0019] FIG. 5 is a diagram illustrating an example of a control outcome achieved by the second embodiment of the controller.

[0020] FIG. 6 is a diagram illustrating a first embodiment of safe states for operating an autonomous entity.

[0021] FIG. 7 is a diagram illustrating an example embodiment of a human model.

[0022] FIG. 8 is a diagram illustrating an example embodiment of a safety index and corresponding safe states for controlling an autonomous entity.

[0023] FIG. 9 is a diagram illustrating an example control outcome when controlling an autonomous entity.

[0024] FIG. 10 is a diagram illustrating an example scenario for controlling an autonomous vehicle.

[0025] FIG. 11 is a diagram illustrating an example embodiment of a vehicle model.

[0026] FIG. 12 is a diagram illustrating a sample interaction between an autonomously- controlled vehicle and other entities.

[0027] FIG. 13 is a block diagram illustrating an example embodiment of a motion predictor. [0028] FIG. 14 is a block diagram illustrating an example embodiment of a controller for controlling an autonomous vehicle.

[0029] FIG. 15 is a diagram illustrating a second embodiment of a safety index and corresponding safe states for operating an autonomous entity.

[0030] FIG. 16 is a diagram illustrating a performance of a controller controlling an autonomous vehicle according to a first embodiment.

[0031] FIG. 17 is a diagram illustrating a performance of a controller controlling an autonomous vehicle according to a second embodiment.

[0032] The figures depict embodiments of the present disclosure for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles, or benefits touted, of the disclosure described herein.

DETAILED DESCRIPTION

OVERVIEW

[0033] In a first embodiment, a controller and control algorithm enables an autonomous entity such as a robot or autonomous vehicle to safely operate in environments where humans or other intelligent agents are present. In factories, the described embodiments enable humans and robots to be co-workers and co-inhabitants in the flexible production lines while ensuring that humans and robots interact efficiently without harming each other. For example, in one embodiment, the robot motion planning and control problem in a human involved environment is posed as a constrained optimal control problem. Here, a modularized parallel controller structure solves the problem online, which includes a baseline controller that ensures efficiency, and a safety controller that addresses real time safety by making a safe set invariant. Capsules are used to represent the complicated geometry of humans and robots.

[0034] In one embodiment, human-friendly industrial co-robots are equipped with abilities such as, for example, (1) collecting environmental data and interpret such data, (2) adapting to different tasks and different environments, and (3) tailoring themselves to the human workers' needs. Furthermore, the control scheme enables the robots to cope with complex and time-varying human motion and assure of real time safety without sacrificing efficiency. A constrained optimal control problem describes this problem mathematically and a modularized controller architecture solves the problem. In an embodiment, the controller architecture is based on two online algorithms: a "safe set algorithm" (SSA) and a "safe exploration algorithm" (SEA), which confine the robot motion in a safe region regarding the predicted human motion. The modularized architecture beneficially 1) treats the efficiency goal and the safety goal separately and allows more freedom in designing robot behaviors, 2) is compatible with existing robot motion control algorithms and can deal with complicated robot dynamics, 3) guarantees real time safety, and 4) is good for parallel computation.

[0035] In another embodiment, the controller controls the driving behavior of an automated vehicle to prevent or minimize occurrences of collisions among vehicles and obstacles while maintaining efficiency (e.g. maintaining high speed on freeway). Given the preprocessed sensory data (which includes the road information and the position and velocity of surrounding vehicles), the controller will make the driving decisions, e.g. finding a safe and efficient trajectory for the automated vehicle. For example, the controller operates in a freeway driving scenario such that at each time step, the automated vehicle predicts the future courses of all surrounding vehicles and confines its trajectory in a safe region regarding the prediction. Based on a multi-agent traffic model, the decision making problem is posed as an optimal control problem, which is solved by 1) behavior classification and trajectory prediction of the surrounding vehicles, and 2) a unique parallel planner architecture which addresses the efficiency goal and the safety goal separately.

[0036] In yet other embodiments, the controller controls another type of autonomous entity to operate safely in the presence of humans or other intelligent agents. While the specification describes details in the context of two specific types of autonomous entities (a robot arm and an autonomous vehicle), the methods, apparatuses, and principles described with respect to one of these embodiments may also be applied to the other embodiment or to a different type of autonomous entity. Similarly, while the specification described details in the context of operating an autonomous entity in the presence of humans (in the robot arm case) or human- driven vehicles (in the autonomous vehicle case), the described methods, apparatuses, and principles described with respect to either or both of these embodiments may also apply to the autonomous entity avoiding other intelligent agents (e.g., animals), human-controlled vehicles or objects, or other autonomously controlled vehicles, objects, or robots. EXAMPLE COMPUTER-BASED CONTROLLER ARCHITECTURE

[0037] FIG. 1 is a block diagram illustrating an embodiment of a controller 100 for controlling an autonomous entity which may operate according to any of the principles described herein. The controller 100 receives one or more input parameters 102. The input parameters 102 may comprise, for example, sensor data used to determine the autonomous entity's position, velocity, acceleration, orientation, joint configuration, or proximity of surrounding objects, control inputs that specify a particular objective for the autonomous entity, or other inputs that guide operation of the autonomous entity as described herein. The controller 100 produces one or more control outputs 104 to control movement of the autonomous entity.

[0038] In one embodiment, the controller 100 comprises a processor 120 and a memory 110. The processor 120 processes data signals and may comprise various computing architectures such as a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets. Although only a single processor 120 is shown in FIG. 1, multiple processors may be included. The processor 120 comprises an arithmetic logic unit, a microprocessor, a general purpose computer, or some other information appliance equipped to transmit, receive and process electronic data signals from the memory 110 or from external inputs.

[0039] The memory 110 comprises a non-transitory computer-readable storage medium that stores computer-executable instructions and computer-readable data. The instructions may comprise code for performing any and/or all of the techniques described herein. The memory 110 may furthermore temporarily or persistently store data inputted to the controller 100 (e.g., input parameter(s) 102), data to be outputted by the controller 100 (e.g., control output(s) 104), and any intermediate data used to carry out the process steps of the controller 100 described herein. Memory 110 may comprise, for example, a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, Flash RAM (nonvolatile storage), combinations of the above, or some other memory device known in the art. In operation, the processor 120 loads the computer-executable instructions and/or data from the memory 110 and executes the instructions to carry out the functions described herein.

[0040] In one embodiment, the memory 110 stores a control module 112. The control module 112 includes computer-executable program instructions that when executed by the processor 120, cause the controller 100 to receive the input parameter(s) 102, processes the input parameter(s) 102 according to the techniques described herein and generate the control output(s) 104 to control the autonomous entity. Particularly, the control module 1 12 generates control outputs 104 that control the autonomous entity to achieve its objective (or as near as possible) while operating safely in the presence of humans or other intelligent or non-intelligent entities in the environment of the autonomous entity.

[0041] The controller 100 may include more or less components than those shown in FIG. 1 without departing from the scope of the embodiments. For example, controller 100 may include additional memory, such as, for example, a first or second level cache, or one or more application specific integrated circuits (ASICs). In other embodiments, the controller 100 may be implemented entirely in hardware, in firmware, or using a combination of software, firmware, and hardware.

[0042] FIG. 2 illustrates an embodiment of a controller 200 for controlling an autonomous entity that may be used as control module 1 12 in the controller 100 of FIG. 1. The controller 200 comprises a state estimator 202, a human motion predictor 204, a baseline controller 218, safety controller 210, and a combining block 222. Alternative embodiments may include additional or different components.

[0043] In general, the state estimator estimates a state 253 of the autonomous entity based on measurements 259. For example, the measurements 259 may be taken from various sensors that sense, for example, a position, velocity, acceleration, orientation, or configuration of the autonomous entity. The motion predictor 204 predicts a motion 205 of one or more other entities (e.g., humans or other moving objects or intelligent agents) in the vicinity of the autonomous entity based on measurements 263 and a motion model 257. Here, the measurements 263 could include sensor data such as radar, LiDAR, images, video, or other sensed data that can be used to predict a state of the other entities that may include, for example, position, velocity, acceleration, orientation, or other configuration of the other entities. The motion predictor 204 applies the motion model 257 to the states of the other entities to predict a future trajectory of the other entities as will be described in further detail below.

[0044] The baseline controller 218 receives the estimated state 253 of the autonomous entity and a goal 251 (e.g., embodied as an input control signal) and generates a baseline control signal 225. For example, the baseline control signal 225 may represent a baseline trajectory of the autonomous entity that, if followed, would enable the autonomous entity to meet the goal 251 in accordance with some optimization criterion defined in further detail below. Thus, for example, if the goal 251 is to move the autonomous entity to a particular position, the baseline controller 218 may generate a baseline control signal 225 that would control the autonomous entity to reach the desired position according to an optimal path (e.g., shortest distance).

[0045] The safety controller 210 generates a safety control signal 21 1 based on the baseline control signal 225, the predicted motion 205 of the other entities, and the estimated state 253 of the autonomous entity. Here, the safety control signal 21 1 represents a modification to apply to the baseline trajectory specified in the baseline control signal 225 in order to ensure that the autonomous entity follows a safe trajectory to achieve the goal 251 that avoids collisions with the other entities. Particularly, the safety control signal 21 1, when combined with the baseline control signal 225, may cause the autonomous vehicle to deviate from the baseline trajectory when desirable to meet a safety constraint relating to the prediction motion 205 of the one or more other entities in the vicinity of the autonomous entity.

[0046] The combining block 222 combines the baseline control signal 225 with the safety control signal 21 1 to generate a combined control signal 261 for controlling the autonomous entity to follow a safe trajectory of the autonomous entity that meets the safety constraint while enabling the autonomous entity to complete the goal 251.

[0047] In an embodiment, the safety controller 210 comprises a control modification module 212, a safety constraint module 214, a criteria module 216, and a combining block 224. The safety constraint module 214 generates a safety index 255 based on the estimated state 253 of the autonomous entity and the predicted motion 205 of the one or more other entities. Here, the safety index 255 indicates whether or not the estimated state 253 of the autonomous entity is within a range of safe states given the prediction motion 205 of the one or more other entities, and if outside the range, may indicate a distance between the estimated sate of the autonomous entity and a boundary of the range of safe states.

[0048] The criteria module 216 generate a criteria signal 217 based on the baseline control signal 225 and the safety index 255 that indicates whether the safety index 255 is within the range of safe states or outside of the range of safe states. Thus, for example, the criteria signal 217 may comprise a binary value.

[0049] The control modification module 212 generates a modification signal 213 based on the baseline control signal 225, the estimated state 253 of the autonomous entity, and the prediction motion 205 of the one or more other entities. The modification signal 213 represents an optimal change in trajectory relative to the baseline trajectory that will put the autonomous entity in the range of safe states relative to the predicted motion of the one or more other entities. [0050] The combining block 224 combines the criteria signal 217 and the modification signal 213 to generate the safety control signal 21 1. For example, in one embodiment, the combining block 224 acts as a multiplication block that multiplies the modification signal 213 by the binary value of the criteria signal 217 (e.g., zero or one) such that the safety control signal 21 1 is zero (thus providing no modification to the baseline control signal 225 when combined at combining block 222) when the autonomous entity is already within the safe space and the safety control signal 211 has a non-zero value (thus modifying the baseline control signal 225 when combined at combining block 222) when the autonomous entity is outside the safe space.

[0051] The expected outcome of the controller 200 is shown in FIG. 3 in an example scenario involving an autonomous vehicle 302. In this example scenario, the baseline controller 218 will command the autonomous vehicle 302 to go straight towards its goal 304 according to the baseline trajectory 312. However, the human motion predictor 204 predicts that the human 306 will go to the prediction point 308 and he will be very likely to show up in the uncertainty range 310. Since the baseline trajectory 312 is no longer in the safe region 314, the safety controller 210 causes a modified trajectory 316 to be followed towards the goal 304 and avoids the human 306.

[0052] FIG. 4 illustrates an alternative embodiment of a controller 400 that can alternatively be used as the control module 1 12 in the controller 100 of FIG. 1. In this embodiment, the baseline controller 218 comprises an enhanced baseline controller, also referred to herein as an "efficiency controller" 418. Unlike the baseline controller 218, which does not consider the safety index 255 and thus does not receive the predicted motion 205 from the motion predictor 204, the efficiency controller 418 determines an enhanced baseline control signal (also referred to herein as an "efficiency signal") 425 representing an efficient trajectory that will stay within the safe space based on a long-term motion prediction 205 (together with the estimated state 253 and goal 251). A benefit of the efficiency controller 418 is to provide the autonomous entity a global guidance in the long term in order to avoid local optima. Because this efficient trajectory in the efficiency signal 425 is based on a long-term motion prediction 205, it may not be completely safe if the prediction is inaccurate. Thus, the efficiency signal may still be modified by the safety control signal 21 1 at the combiner 222 to find a safe trajectory in real time for controlling the autonomous entity according to a combined control signal 461. However, relative to the baseline controller 218, the efficiency controller 418 may generate an initial trajectory more efficiently and reduce the computational burden on the safety controller 210 to find the modification that results in a safe trajectory. [0053] For example, in the example of FIG. 5, a robot 510 is configured to reach the goal 502 which is blocked by the moving obstacle 504. The baseline controller 218 will generate a straight trajectory 506 towards the goal 502. In order to reach the goal 502 safely, the safety controller 210 determines a large detour, which may be computationally intensive and may take a long time to calculate since the robot 510 actually goes away from the goal 502 by detouring according the trajectory 512. Such behavior is undesired in the short term.

However, the efficiency controller 418 will generate a reference detour in the long term. Although the trajectory may not be completely safe as the long term prediction of the moving obstacle may not be accurate, the long term detour can better guide the robot 510 and the safety controller 210 can then make it easier to find a safe and efficient trajectory in real time. While the efficiency controller 418 may achieve better performance, a trade-off with the above described baseline controller 218 is that the baseline controller 218 may have lower computation complexity.

[0054] Operation of the controller 100 is discussed in further detail below. Example operations are described in the context of the controller 100 controlling a robot arm as the autonomous entity and in the context of the controller 100 controlling an autonomous vehicle as the autonomous entity. Other types of autonomous entities may similarly be controlled by the controller 100 according to the principles described herein.

EMBODIMENTS IN CONTEXT OF ROBOT ARM

[0055] FIGs. 6-9 and the associated description describe a controller for controlling an autonomous entity and operating principles for the same in the context of a robot arm operating in the presence of humans. The methods, apparatuses, and principles described with respect to FIGs. 6-9 may similarly apply to controlling an autonomous vehicle in the presence of other vehicles, or to controlling another type of autonomous entity in the presence of intelligent agents.

ALGORITHMIC SAFETY MEASURES : THE OPTIMIZATION PROBLEM

[0056] The following examples discuss operation of the controller 100 in a scenario in which co-robots co-operate and co-inhabit with human workers. Safety in co-inhabitance and contactless co-operation form basic interaction types during human robot interactions. Since the interaction is contactless, robots and humans are independent to one another in the sense that the humans' inputs will not affects the robots' dynamics in the open loop. However, humans and robots are coupled together in the closed loop, since they will react to others' motions.

PROBLEM FORMULATION

[0057] The state of the robot of interest (e.g., state 253) may be denoted as x R £ ™ and the robot's control input as u R £ M M where n, m £ N. Assume the robot dynamics is affine, i.e.

*R = / " (½) + x R )u R (V

[0058] Here, a system can have an affine form through dynamic extension. For example, if x R = F(x R , u R ), and x R = [x R , u R ] T , and the new control input is u R = ii R , then the new system given by x R = [^g R ^] + [^] U R is affine.

[0059] The task or the goal for the robot (e.g., goal 251) is denoted as G R , which can be, for example, 1) a settle point in the Cartesian space (e.g. a work piece the robot needs to get), 2) a settle point in the configuration space (e.g. a posture), 3) a path in the Cartesian space or 4) a trajectory in the configuration space.

[0060] In one embodiment, the robot fulfills the aforementioned tasks safely. For example, x H is the state of humans and other moving robots in the system, which are indexed as H = {1,2,---, N}. Then the system state is x = [x R , x i] T . The collision free state space may be denoted as X S , e.g. X S = {x: d(x H , x R ) > 0} where d measures the minimum distance among the robot, the humans and all other moving robots. FIG. 6 illustrates X S in the upper left and lower right regions off-diagonal area in the system's state space. Given the human configuration, the constraint on the robot's state space R s (X H ) is a projection from X S , e.g. R S (X H ) = : i X R> X H Y e ¾L which is time varying with x H . Hence, two steps may be applied to safely control the robot motion: 1) predicting the human motion; and 2) finding the safe region for the robot (marked area along x R axis in FIG. 6) based on the prediction.

THE OPTIMIZATION PROBLEM

[0061] In one embodiment, the goal of the co-robot is to finish the tasks GR efficiently while staying in the safe region RS(XH), which leads to the following optimization problem:

min ]{x R , u R , G R ) (2) s.t. u R £ Ω, x R £ Γ, x R = f(x R ) + h(x R )u R (3) x R £ R s (x H ) (4) where J is a goal related cost function to ensure efficiency, Ω is the constraint on control inputs, Γ is the state space constraint (e.g., joint limits, stationary obstacles). The safety constraint RS(XH) may be nonlinear, non-convex and time varying with unknown dynamics.

[0062] In an embodiment, numerical methods may be used to solve non-convex

optimizations, e.g. sequential convex optimization, A* search, and Monte-Carlo based rapidly-exploring random trees (R T) method. However, the computation loads may be too high for online applications on industrial co-robots. Thus in other embodiments, analytical methods such as potential field methods and sliding mode methods may be used which have low computation loads. However, these methods may fail to emphasize optimality.

Moreover, the motion patterns of human subjects (or other intelligent robots) are much more complicated than those of general obstacles due to interactions, e.g. XH may be a function of XR. TO solve the problem, a safe set algorithm (SSA) may be used in one embodiment to identify the dependency of XH on XR online and regulate the control input of the robot in a supervisory loop so as for the system state to stay in the safe set s. A safe exploration algorithm (SEA) may be built upon SSA to reflect the uncertainties in the prediction of XH in robot motion control. In yet further embodiments, a generalized method may be used in a modularized controller architecture that can handle three-dimensional (3D) interactions as described below.

[0063] In a first embodiment according to the architecture of FIG. 2, the baseline controller 218 discussed above solves equations (2)-(3), which is time -invariant and can be solved offline. The safety controller 210 enforces the time varying safety constraint (4), which computes whether the baseline control signal 225 is safe to execute or not (in the safety constraint module 214 and the criteria module 216) based on the predictions 205 made in the human motion predictor 204, and what the modification signal 213 should be (in the "control modification module 212).

[0064] In the alternative embodiment according to the controller architecture of FIG. 4, the efficiency controller 418 solves equations (2)-(4). By solving the equations together in the efficiency controller 418, global guidance can be achieved in the long term, avoiding local optima.

THE BASELINE CONTROLLER AND THE EFFICIENCY CONTROLLER

[0065] The baseline controller 218 in the embodiment of FIG. 2 solves equations (2)-(3), which is similar to the controller in use when the robot is working in the cage. The cost function may be designed to be quadratic which penalizes the error to the goal and the magnitude of the control input, e.g. when GR is a trajectory, / = J Q [(x R — G R ) T P(x R — G R ) + u R Ru R ]dt where P and R are positive definite matrices. The control policy can be obtained by solving the problem offline. Basic collision avoidance algorithms can be used to avoid stationary obstacles described by the constraint XR £ Γ. This controller 218 is included to ensure that the robot can still perform the tasks properly when the safety constraint RS(XH) is satisfied.

[0066] The efficiency controller 418 in the embodiment of FIG. 4 operates similarly to the baseline controller 218 but solves equations (2)-(4) above.

THE HUMAN MODEL AND THE HUMAN MOTION PREDICTOR

[0067] In different applications, the human body may be represented at various levels of details. For AGVs, mobile robots, and planar arms, since the interactions with humans happen in two-dimensional space (2D), a human can be tracked as a rigid body in the 2D plane with the state XH being the position and velocity of the center of mass and the rotation around it. For robot arms that interact with humans in 3D, the choice of the human model depends on his distance to the robot. When the robot arm and the human are far apart, the human may also be treated as one rigid body to simplify the computation. In the close proximity, however, the human's limb movements may be considered. As shown in FIG.6, the human may be modeled as a connection of ten rigid parts: part 1 is the head; part 2 is the trunk; part 3, 4, 5 and 6 are upper limbs; and part 7, 8, 9 and 10 are lower limbs. The joint positions can be tracked using 3D sensors. The human's state XH can be described by a combination of the states of all rigid parts.

[0068] The prediction of future human motion XH by the motion predictor 204 may be done in two steps: inference of the human's goal GH and prediction of the trajectory to the goal. Once the goal is identified, a linearized reaction model can be assumed for trajectory prediction, e.g.

x H = Ax H + B 1 G H + B 2 x R + w H (5) where WH is the noise, A, Bi, and B2 are unknown matrix parameters which encode the dependence of future human motion on his current posture, his goal and the robot motion. Those parameters can be identified using parameter identification algorithms, while the prediction can be made using the identified parameters. Note that to account for human's time varying behaviors, the parameters may be identified online. This method is based on the assumption that human does not 'change' very fast. Moreover, to reduce the number of unknown parameters, key features that affect human motion can be identified through offline analysis of human behavior. Those low dimension features {/;} can be used in the model (5) to replace the high dimension states x H and x R , e.g. x H =∑ j ¾ j + + w H .

THE SAFETY INDEX

[0069] The safe set X s is a collision free subspace in the system's state space, which depends on the relative distance among humans and robots. Since humans and robots have complicated geometric features, simple geometric representations may be used for efficient online distance calculation. For example, in one embodiment, ellipsoids may be used.

However, because it is difficult to obtain the distance between two ellipsoids analytically and to reduce the computation load, capsules (or spherocylinders) may be used. Here, a spherocylinder consists of a cylinder body and two hemisphere ends. Spherocylinders are introduced to bound the geometric figures. A sphere is considered as a generalized capsule with the length of the cylinder being zero. The distance between two capsules can be calculated analytically, which equals to the distance between their center lines minus their radiuses as shown in FIG.7B. In the case of a sphere, the center line reduces to a point. In this way, the relative distance among complicated geometric objects can be calculated just using several skeletons and points. The skeleton representation is also ideal for tracking the human motion.

[0070] Given the capsules, the design of X s may be mainly the design of the minimum distances among the capsules. In an embodiment, the design is not too conservative, while larger buffer volumes may be used to bound critical body parts such as the head and the trunk, as shown in FIG.7A. The safe set in the 3D interactions can be designed as:

X S = \ X : ½ ≤ > l, Vi = l, - -- , 10, V ; - 6 / } (6) where ά(ρ^, x R ) measures the minimum distance from the capsule of body ί part on the human (or the robot) j to the capsules of the robot R. dij min £ M + is the designed minimum safe distance.

[0071] To describe the safe set ¾ efficiently, a safety index φ (e.g., safety index 255 in FIG. 2) is introduced (as generated by safety constraint module 214), which is a Lyapunov-like function over the system's state space as illustrated in FIG.8. In one embodiment, the safety index may be selected to satisfy three conditions: 1) the relative degree from φ to the robot control u R in the Lie derivative sense is one (to ensure that the robot's control input can drive the state to the safe set directly); 2) φ is differentiable almost everywhere; 3) the control strategy that φ < 0 if φ ≥ 0 will make the set X s invariant, i.e. x(t) £ X s , Vt > t 0 if x(t 0 ) £ X s . For example, the safety index for the safe set in (6) can be designed as:

φ = 1 + γ - d* c _ ki <i* k^d*)^ (7) where (d * ) = ' ' ^ and the capsule i * on the human (or the robot) j * is the capsule that

"i* j*,min

contains the closest point (the critical point) to the robot R. I £ N is the relative degree from the function d(-, x R ) to u R in the Lie derivative sense. In some applications, I = 2 since the robot's control input can affect joint acceleration, c > 1 is a tunable parameter, while larger c means heavier penalties on small relative distance, γ > 0 is a safety margin that can uniformly enlarge the capsules in FIG. 7. k 1 ,---, k l _ 1 are tunable parameters that satisfy the condition that all roots of 1 + k 1 s + ■■■ +k l _ 1 s l~1 = 0 are on the negative real axis in the complex plane. The higher order terms of cTare included to make sure that the robot does not approach the boundary of the safe set in a large velocity, so that the state can always be maintained in the safe set even if there are constraints on the robot control input, e.g. u R £ Ω.

THE CRITERIA

[0072] Given the safety index 255, the criteria module 216 determines whether or not a modification signal 213 should be added to the baseline control signal 225. There are two kinds of criteria: (I) 0(t) > 0, or (II) φ(ί + At)≥ 0. The first criterion defines a reactive safety behavior, i.e. the control signal is modified once the safety constraint is violated. The second criterion defines a forward-looking safety behavior, i.e. the safety controller considers whether the safety constraint will be violated Δί time ahead. The prediction in the second criterion is made upon the estimated human dynamics and the baseline control law. In the case when the prediction of future

x H has a distribution, the modification signal may be added when the probability for criteria (II) to happen is non-trivial, e.g. + Δί) > 0}) > e for some e £ (0,1). THE SET OF SAFE CONTROL AND THE CONTROL MODIFICATION

[0073] The set of safe control UR is the equivalent safety constraint on the control space, i.e. the set of control that can drive the system state into the safe set as shown in FIG. 8. By construction, the robot can drive the system state into the safe set through the safety index, i.e. by choosing a control such that φ < 0. Since

then the set of safe control when φ≥ 0 is

[0074] where η £ M + is a margin and x H comes from human motion predictor. When x H has a distribution, let Π be the compact set that contains major probability mass of x H , e.g.

P({x H £ Π}) > 1— e for a small e. Then the inequality in (9) should hold for all x H £ Π, as illustrated in FIG. 6.

[0075] The non convex state space constraint ff 5 (x H ) is then transferred to a linear constraint on the control space in (9). In this way, the modification signal is the optimal value to be added to the baseline control law such that the final control lies in the set of safe control,

Au R = arg min uR o +ue u s n{2 n Ur u T Qu (10) where Q £ M mxm is positive definite which determines a metric on the robot's control space. To obtain optimality, Q may be close enough to the metric imposed by the cost function / in (2), e.g. Q « d 2 J/du where / is assumed to be convex in u R . U R is the equivalent constraint on the control space of the state space constraint Γ, which can be constructed following the same procedure of constructing UR . Equation (10) is a convex optimization problem. In the case that UR n Ω Π U R is empty, a smaller margin η can be chosen so that the feasible control set becomes nonempty.

[0076] An embodiment of a process determining control parameters and algorithms for operating a controller is as follows:

1) Obtain a baseline controller suitable for the application that can handle the goal and the time invariant constraints.

2) Wrap every moving rigid body with a capsule to simplify the geometry. 3) Determine the safe set X s which specifies the required distance among capsules.

4) Determine the safety index φ based on the safe set X s and the robot dynamics.

5) Choose the control modification criteria and design the control modification metric Q .

6) Select a human motion predictor suitable for the application.

PLANAR ROBOT ARM

[0077] In one example embodiment, the described controller 100 and control algorithm controls a planar robot arm. The joint angle may be denoted as Θ = [θ 1 , θ 2 ] τ . The dynamic equation of the robot arm is Μ(θ)θ + Ν(θ, θ) = z R where (-) is the generalized inertia matrix and is N(y) is the Coriolis and centrifugal forces. Both functions depend on the robot state x R = [θ τ , θ τ ] τ■ u R = Tp> is the torque input. The state space equation of the planar robot is affine:

0

' = ί ^ + u R (11)

Xr ~ Υ-Μ- 1 {θ)Ν , θ Μ (θ)

[0078] The vertical displacement of the robot arm may be ignored in one embodiment. In an example environment, the human and the robot need to approach their respective goal points in minimum time. The baseline controller 218 and efficiency controller 418 may comprise a computed torque controller with settle point G R . The safety index 255 may be characterized as φ = D— d 2 — d, where d measures the minimum distance between the human and the robot arm and D = + y) . In one embodiment, a sampling frequency of 20hz may be used. Due to the limitation of bandwidth, both reactive and forward-looking criteria may be used, in order not to violate the safety constraints between two samples. The set of safe control U R (k) at time k is the intersection of the two sets: U 1 = {u R (/e) : 0(/ )≤ η when 0(/ ) > 0} and U 2 = {u R (k) ■ 0(/ + 1) < 0}. The computation of U t follows from (9). The computation of U 2 is similar. The metric Q is chosen to be (0) in this embodiment, which puts larger penalties on the torque modification applied to heavier link, thus is energy efficient.

[0079] Example results of a sample interaction are illustrated in FIG. 9. The first plot 910 in FIG. 9 shows the critical point on the arm that is the closest to the human capsule. The area 902 represents the first link (y = 0 is the base) and area 904 represents the second link (y = 0.55m is the endpoint). The second plot 920 shows the distance from the robot endpoint to the robot's goal position. The third plot 930 shows the relative distance d between the robot capsules and the human capsule, while the red area represents the danger zone {d < d min }. The bars in the fourth plot 940 illustrate whether the safety controller 210 is active or not at each time step. During this example interaction, the robot was close to its goal at k = 55 and at k = 110 before it finally approached it at k = 220. However, since the human was too close to the robot in that two cases, going to the goal was dangerous. Then the safety controller went active and performed real time trajectory planning to let the robot arm detour to avoid the human.

ADDITIONAL DISCUSSION

[0080] As described above, algorithmic safety measures for industrial robots working in a human-involved environment are solved using a parallel controller structure that solves a constrained optimal control problem. The control problem was separated into two parts: the efficiency goal with time-invariant constraints and the time-varying safety constraint. The baseline controller 218 solves the first part and the safety controller 210 enforces the safety constraint. This separation is beneficial for several reasons. First, the separation avoids the need to solve the original problem in a long time horizon, since the uncertainties of the human motion will accumulate. Furthermore, the safety constraint ff 5 (x H ) is generally only active in a small amount of time. The separation respects different natures of the constraints, by allowing the baseline controller to do long term planning without the time varying constraint and letting the safety controller to do local modification regarding the time varying constraint.

[0081] This separation can also be validated by analytically solving the optimal control problem. Suppose G R = {x R = 0} and / = / (x R Px R + u R Ru R )dt. Let Ω, Γ be the whole space and ff 5 (x H ) = {x R ·■ (p(x R , x H ) < 0}. Assume h(x R ) = B. Then the Lagrangian of the optimal control problem (2-4) is

L = x R Px R + u R Ru R + X ( ( R) + Bu R ) + ηφ (12) where x, η are adjoint variables and η = 0 if φ < 0. The partial derivatives from L to u R is L u = 2 (Ru R ) T + X B + ηφ χ B. Setting L u = 0, the optimal control law becomes where the first term on the RHS is not related to the safety constraint, which can be viewed as the baseline control law; the second term is concerned with the safety constraint, which is nontrivial only if φ≥ 0, e.g. the safety constraint is violated. Moreover, the separation offers more freedom in designing the robot behavior and is good for parallel computation.

EMBODIMENTS IN CONTEXT OF AUTONOMOUS VEHICLE

[0082] FIGs. 10-17 describe a controller for controlling an autonomous entity and operating principles for the same in the context of an autonomous vehicle operating in the presence of other vehicles. The methods, apparatuses, and principles described with respect to FIGs. 10- 17 may similarly apply to controlling a robot arm in the presence of humans, or to controlling another type of autonomous entity in the presence of intelligent agents.

THE MULTI- AGENT TRAFFIC MODEL

[0083] The scenario shown in FIG. 10 may be modeled in the framework of multi -agent systems. In this example, it is assumed that the lanes in the freeway are along the x-direction with no curvature. The lanes are indexed increasing from the right lane (bottom of FIG. 10) to the left lane (top of FIG. 10).

THE SYSTEM MODEL

[0084] A multi-agent system (MAS) comprises a system composed of multiple interacting intelligent agents within an environment. All vehicles (the automated vehicle and other manually driven vehicles) on freeway are viewed as agents, which have several important characteristics: 1) autonomy: the agents are self-aware and autonomous; 2) local views: no agent has a full global view of the system; 3) decentralization: there is no designated controlling agent.

[0085] From a global view, there can be thousands of agents (vehicles) in the system (e.g. on the freeway). But only the local interactions among the vehicles are of interest. Hence, in the controller of the automated vehicle, only the behavior of the surrounding vehicles will be analyzed, while surrounding vehicles refer to the vehicles that can be detected and within certain distances to the automated vehicle.

[0086] Suppose there are N surrounding vehicles locally and are indexed from 1 to N. Let H = { 1, .., N} be the set of indices for all surrounding vehicles. The automated vehicle has index 0. Since the surrounding vehicles are changing from time to time, mathematically, the topology of the MAS is time varying.

[0087] For vehicle / ' , denote its state as ¾, control input as w. Its dynamic equation can be written as

±i = fi (Xi, Ui, Wi), Vi = 0, - , Ν (14)

[0088] where Wi is the disturbance introduced by the environment, e.g. wind. For simplicity, only the kinematic model will be considered. FIG. 1 1 shows a bicycle model used for all vehicles / ' , where (pf , pf) is the position of the center of the rear axle, v; the forward speed, (k the vehicle heading ($ = 0 when the vehicle is following the lane), γ; the steer angle and Li the vehicle wheelbase. Assuming no tire slip angle, the kinematics of vehicle / ' follow from pf = Vi cos(#i), pf = ViS (9i), θι =— tan j . Since the mapping from γ to 0; is homeomorphic given vt, is chosen as an input signal instead of γ. For vehicle / ' , define x i = [Pi ' Pf> v i> θί] Τ u i = [ v i> &i] T . Hence (14) can be simplified as

*i = f(x i ) + Bu i + Bw i , Vi = 0, - , N (15) where

fixi) = = [B lt B 2 ] (16)

[0089] Let Xe be the state of the environment, e.g. speed limit vn m , stationary obstacles and so on. Then the system state x is defined as x = [xj, [, ··· , x i, %IY■

[0090] Agent / chooses the control u t based on its information set n t and its objective G t (which can be intended behaviors or desired speed). The information set is a combination of the measurements and the communicated information. In one embodiment, it is assumed that there is no direct communication among vehicles. In this way, agent z"s information set at time T contains all the measurements up to time T, i.e. (T) = {y t (t) }t£ [o,r| where

y i = h i (x, z i ), Vi = 0, - , N (17) and Zj is the measurement noise. The controller can be written as

Ui = 9i (π 0 G { ), Mi = 0, - , iV (18) [0091] Based on (14), (17) and (18), the block diagram for the multi-agent system is shown in FIG. 12, where every row represents one agent. All agents are coupled together in the closed loop system due to measurement feedbacks.

THE OPTIMAL CONTROL PROBLEM

[0092] In an embodiment, the controller for the automated vehicle, i.e. the function go, may be chosen by balancing the following two factors: 1) Efficiency: The objective G 0 should be achieved in an optimal manner through minimizing a cost function J(x 0 , u 0 , G 0 ); 2) Safety: Efficiency requirement should be fulfilled safely. Denote the system's safe set as X s , which is a closed subset of the state space X of the system state x that is collision free. Then g 0 should be chosen such that Vt, x(t) £ X s . An optimal control problem similar to that presented in (2- 4) may be formulated. For example, in one embodiment, the following optimal control problem can be formulated:

minuo E(J(x Q , U„, G 0 )) (19) s.t. u 0 £ Ω, x 0 £ Γ (x e ), x 0 = f(x 0 ) + Έιι 0 (20)

XE X S (21) where Ω is the control space constraint for vehicle stability, which depends on model uncertainties and disturbances, and Γ (x e ) is the state space constraint regarding the speed limit. The above optimization is generally hard to solve in view of the safety constraint (21), as the dynamics of the system state x is only partially known and the set X s is non convex. To solve the problem, the behaviors of the surrounding vehicles can be identified and predicted online and a parallel structure can solve the non-convex optimization efficiently, as will be described in further detail below.

IDENTIFICATION AND PREDICTION OF THE DRIVING BEHAVIOR OF SURROUNDING VEHICLES

[0093] Instead of predicting other vehicles' trajectories directly, human drivers may classify other drivers' intended behavior first. If the intended driving behaviors are understood, the future trajectories can be predicted using empirical models. Mimicking what humans would do, the learning structure in FIG. 13 is designed for the automated vehicle to make predictions of the surrounding vehicles, where the process is divided into two steps: 1) the behavior classification, where the observed trajectory of a vehicle goes through an offline trained classifier; and 2) the trajectory prediction, where the future trajectory is predicted based on the identified behavior, by using an empirical model which contains adjustable parameters to accommodate the driver's time-varying behavior. Here, when G; denotes vehicle z"s intended behavior, the behavior classification is a backward process to identify G t , while the trajectory prediction is a forward process to predict vehicle z"s closed loop behavior [ =/ ( j) + "Bgi (7Γ [ , Gi) based on identified Gj. The classification step may be beneficial when the communications among vehicles are limited. Otherwise, vehicles can broadcast their planned behaviors.

[0094] The design of the empirical models and the online learning algorithm will next be described.

THE DRIVING BEHAVIORS

[0095] The intended behavior of vehicle i at time step may be denoted as k as bi(k). At least the following five behaviors may be considered:

Behavior 1 (B^: Lane following;

Behavior 2 (B 2 ): Lane changing to the left;

Behavior 3 (B 3 ): Lane changing to the right;

Behavior 4 (B ): Lane merging;

Behavior 5 (B 5 ): Lane exiting;

[0096] where B 1 is the steady state behavior; B 2 , B 3 and B 4 are driving maneuvers; and B 5 is the exiting behavior. It is assumed that there are be gaps (lane following) between two maneuvers (B 2 , B 3 and B ). Let e M 5 be the probability vector that vehicle i intends to conduct B 1 , ... , B 5 at time step k given information up to time step k.

THE EMPIRICAL MODELS FOR TRAJECTORY PREDICTION

[0097] The future trajectory is predicted according to the most likely predicted behavior argmax B . P ( i Q = B j ). For lane following Bi, Θ; « 0 and the lateral deviation of the vehicle can be ignored. In one embodiment, vehicle i only regulates its longitudinal speed to match the speed of the traffic flow and the speed of its front vehicle, e.g.

*i = f{xd + B 1 [k + fi 5 ] (22) where k 1 , k 2 £ M. are online-adjustable parameters. The parameter identification method for and ki will be discussed in the next section. For lane changing Bi and B3, the vehicle not only regulates the longitudinal speed, but also regulates the lateral position, hence the turning rate. The empirical model can be described as:

±i = f(Xi) + B 1 [k + k 2 fi 5 ] + B 2 [k 3 f? + k^ 10 ] (23) where fa , fa, fa, fa £ M. are online-adjustable parameters.

ONLINE PARAMETER ADAPTATION

[0098] The online parameter adaptation is desirable to capture the time varying behavior of drivers. For example, to identify and ki in (11), the reduced order equation may be used: fi = Vi = kifi + k 2 fi = β τ φ where β = [k lt k 2 ] T and φ = [/J 4 , f ] 7 . Based on the historical data of ff, ff , and f , ki and fa can be identified using a recursive least square algorithm. Let /?(/e) = (/c), k 2 (/ )] be the estimates of [k l t k 2 ] at time step k. Then the estimates can be updated by the following equation,

/?(/e + 1) = /?(/e) + F (fc) (tf (k + 1) - β&ΥφΟθ) (24) where F is the learning gain. Although k 1 and k 2 are identified using historical data, they can be used to predict the trajectories in the near future if the vehicle's behavior does not change very fast compared to the sampling rate.

IV. ONLINE DECISION MAKING AND CONTROL

[0099] Based on the predictions of the surrounding vehicles, the automated vehicle may find a safe and efficient trajectory satisfying the optimal control problem (19-21). However, since the uncertainties regarding the predictions will accumulate in time, there is no need to solve the original problem in a long time horizon. The decision making architecture in FIG. 14 may be used in one embodiment, which is designed to be a parallel combination of a baseline planner 1402 that solves the problem in a long time horizon without the safety constraint (21), and a safety planner 1404 that takes care of the safety constraint in real time. As can be seen, the controller in FIG. 14 has similarities to the controller 200 of FIG. 2 or the controller 400 of FIG. 4 described above. Particularly, this controller separates the baseline planner 1402 from a safety planner 1404 in which the baseline planner 1402 operates to control the vehicle to achieve a desired objective, while the safety planner 1404 operates to modify the trajectory if needed to avoid collisions. Alternatively, the baseline planner 1402 may operate similarly to the efficiency controller 418 of FIG. 4 to generate an efficient trajectory based on the safety constraint. Here, the baseline planner 1402 and the safety planner 1404 both operate based on inputs from a vehicle state estimator 1406 that provides an estimated state of the vehicle, and a learning center and predictor 1408 that provides predictions based on a learned model.

THE BASELINE PLANNER

[0100] In an embodiment, the baseline planner 1402 solves the optimal control problem (19- 20) to ensure efficiency, which is similar to the planner in use when the automated vehicle is navigating in an open environment. When the cost function and the control constraint are designed to be convex, (19-20) become a convex optimization problem.

[0101] Suppose the objective Go (target behavior and target speed v r ) is specified. The baseline planner 1402 tries to plan a trajectory to accomplish Go. When the objective is to follow the lane, the cost function is designed as

J = JolOo - v r ) 2 + q(/o 9 ) 2 + u T 0 R Uo ]dt (25)

[0102] where q £ M + and R £ . 2x2 is positive definite. When the objective is to change lane, the automated vehicle should change the lane smoothly within time T. The cost function is designed as

; = ζ >(t)dt + Γ(/ 0 10 (Τ , Θ 0 (Τ ) (26) where Φ = (y 0 - v r ) 2 + is the terminal cost; qi, qi, si S2 £ + ; and R £ M 2x2 is positive definite.

[0103] The computation in the baseline planner 1402 can be done offline. The resulting control policy will be stored for online application, to ensure real-time planning.

[0104] In an alternative embodiment, the baseline planner 1402 may instead operate similarly to the efficiency control 418 of FIG. to solve the optimal control problem in (19)-(21), thus generating an efficient trajectory based on the safety constraint.

THE SAFETY PLANNER

[0105] The safety planner 1404 modifies the trajectory planned by the baseline planner 1402 locally to ensure that it will lie in the safe setXs in real time.

[0106] During lane following, the safety planner 1404 may control the automated vehicle to keep a safe headway. Thus = {x : d 1 (x) > d min ] where di (x) calculates the minimum distance between the automated vehicles and the vehicle or obstacles in front of it. During lane changing, the safety constraint controls the automated vehicle to keep a safe distance from vehicles on both lanes. Thus ¾(¾), ¾ (¾) = i x '■ d 2 (x)≥ d min ] where di(x) calculates the minimum distance between the automated vehicle and all surrounding vehicles and obstacles in the two lanes.

[0107] As with the controller 200 of FIG. 2 described above, mathematically, the safe set is described using a safety index φ, which is a real-valued continuously differentiable function on the system's state space. The state x is considered safe only if φ(χ)≤ 0. An example of the safety index was described earlier with respect to FIG. 8, and another example is shown in FIG. 15. In FIG. 10, the safe region for the automated vehicle is affected by the future trajectory of the surrounding vehicles. Based on the prediction of other vehicles, if the baseline trajectory leads to φ≥ 0 now or in the near future, the safety planner will generate a modification signal to decrease the safety index by making φ < 0.

[0108] In order for the above strategy to work, the safety index needs to satisfy the following conditions: 1) ≠ 0; 2) the unsafe set Χξ is not reachable given the control law φ <

0 when φ≥ 0 and the initial condition x(t 0 ) e ¾· The first condition ensures that φ can be affected by wo, while the second condition ensures that the modified trajectory belongs to Xs. Such a function φ exists for any safe set described by X s = {x : d(x) > d min ] if the function d() is smooth. Moreover, the safety index φ = D— df (x)— adj (x) j = 1,2) has the desired performance for systems with state equation (2), where D > d^ in , and a > 0 are constants. To ensure safety, the control input of the automated vehicle may be chosen from the set of safe control U s (t) = {u 0 (t) : φ≤—η 0 when φ≥ θ} where η 0 £ M + is a safety margin. By ( 15), the set of safe control when φ≥ 0 can be written as

¾(t) = {u 0 ( : L( «o( ≤ 5( } (27) where L(t) = ^- B, S(t) =—η 0 — , e /i ^" ~- and Xi is the prediction made by the trajectory predictor.

[0109] If the baseline control input uo (t) is anticipated to violate the safety constraint, the safety controller will map it to the set of safe control Us (t) according to the following quadratic cost function where Wis a positive definite matrix and defines a metric in the vehicle's control space. To obtain optimality, ^should be close enough to the metric imposed by the cost function Jin (25) and (26), e.g. W « d 2 J /du^ where Jis convex in wo. The above explanation follows the same general principles as described above with respect to (7)-(10), using some variation in notation when applied to the autonomous vehicle example. In the lane following mode, if the lateral deviation is large due to obstacle avoidance and the safety controller continues to generate turning signal Θ≠ 0, the vehicle will enter the lane changing mode.

EXAMPLE APPLICATION

[0110] In an illustrative example, an automated vehicle is assumed to start with the rightmost lane with the desired longitudinal speed v r = 30m/s « 67mile/h. The sampling time may be set to 0.05s. An example objective of following the lane is considered under two different cases.

[0111] 1) Case 1 - Stationary Obstacle: FIG. 16 shows the case when the automated vehicle suddenly noticed a stationary obstacle 40m ahead. In this example, the safety controller goes active. By mapping the baseline input uo to Us in (28), the command for deceleration and turn is generated. Then the automated vehicle slows down and changes lane to the left to avoid the obstacle. After lane changing, the vehicle accelerates to the desired speed again.

[0112] 2) Case 2 - Slow Front Vehicle: FIG. 17 shows another example case when the front vehicle is too slow. To illustrate the interaction, the trajectories of both vehicles are down sampled and shown in the last plot (bottom) in FIG. 17, where circles represent the automated vehicle and squares represent the slow vehicle. Different shades correspond to different time steps, the lighter the earlier. At the beginning, since the automated vehicle does not keep the desired speed behind the slow car, it starts to change lane to the left. After changing the lane, it overtakes the slow vehicle.

CONCLUSION

[0113] The multi-agent traffic model described above solves an optimal control problem for vehicle trajectory planning. To solve the problem, the behaviors of surrounding vehicles are identified and their future trajectories predicted. Based on the predictions, the optimal control problem is solved online using a parallel combination of a baseline planner which solved the problem without the safety constraint and a safety planner which took care of the safety constraint online. ADDITIONAL GENERAL CONSIDERATIONS

[0114] Reference in the specification to "one embodiment" or to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. The appearances of the phrase "in one embodiment" or "an embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

[0115] Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps (instructions) leading to a desired result. The steps are those requiring physical

manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic or optical signals capable of being stored, transferred, combined, compared and otherwise manipulated. It is convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. Furthermore, it is also convenient at times, to refer to certain arrangements of steps requiring physical manipulations or transformation of physical quantities or representations of physical quantities as modules or code devices, without loss of generality.

[0116] However, all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "displaying" or "determining" or the like, refer to the action and processes of a computer system, or similar electronic computing device (such as a specific computing machine), that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.

[0117] Certain aspects of the embodiments include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions of the embodiments can be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems. The embodiments can also be in a computer program product which can be executed on a computing system.

[0118] The embodiments also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the purposes, e.g., a specific computer, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Memory can include any of the above and/or other devices that can store information/data/programs and can be transient or non-transient medium, where a non-transient or non-transitory medium can include memory/storage that stores information for more than a minimal duration.

Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

[0119] The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may also be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the method steps. The structure for a variety of these systems will appear from the description herein. In addition, the embodiments are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the embodiments as described herein, and any references herein to specific languages are provided for disclosure of enablement and best mode.

[0120] In addition, the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the embodiments is intended to be illustrative, but not limiting, of the scope of the embodiments, which is set forth in the claims.

[0121] While particular embodiments and applications have been illustrated and described herein, it is to be understood that the embodiments are not limited to the precise construction and components disclosed herein and that various modifications, changes, and variations may be made in the arrangement, operation, and details of the methods and apparatuses of the embodiments without departing from the spirit and scope of the embodiments as defined in the appended claims.