Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ROBOT TRANSPORTATION MODE CLASSIFICATION
Document Type and Number:
WIPO Patent Application WO/2019/173321
Kind Code:
A1
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing transportation mode classification. One system includes a robot configured predict a transportation mode from sensor inputs using a transportation mode classifier. The predicted transportation mode is used to generate an updated emotion state, and a behavior for the robot is generated using the updated emotion state.

Inventors:
ANDERSON ROSS PETER (US)
Application Number:
PCT/US2019/020724
Publication Date:
September 12, 2019
Filing Date:
March 05, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ANKI INC (US)
International Classes:
G06F15/00
Domestic Patent References:
WO2014118767A12014-08-07
Foreign References:
US20040243281A12004-12-02
US20110282620A12011-11-17
US6362589B12002-03-26
US20030060930A12003-03-27
Download PDF:
Claims:
CLAIMS

1. A robot comprising:

a body and one or more physically moveable components;

one or more sensor subsystems that are configured to generate sensor inputs;

one or more processors; and

one or more storage devices storing instructions that are operable, when executed by the one or more processors, to cause the robot to perform operations comprising:

receiving a plurality of sensor inputs from the one or more sensor subsystems; generating, by a transportation mode classifier from the plurality of sensor inputs, a predicted transportation mode of the robot;

providing the predicted transportation mode to an emotion state engine that is configured to maintain an internal emotion state of the robot;

updating, by the emotion state engine, the internal emotion state of the robot based on the predicted transportation mode of the robot;

providing the updated emotion state of the robot to a behavior selection engine;

generating, by the behavior selection engine, a behavior for the robot to perform based on the updated emotion state; and

performing, by the robot, the behavior generated by the behavior selection engine for the predicted transportation mode.

2. The robot of claim 1, wherein the predicted transportation mode represents that the robot is in motion due to forces of an external entity and identifies one of an enumerated class of circumstances under which the motion occurs.

3. The robot of claim 1, wherein the predicted transportation mode represents that the robot is being moved in a particular way while being held or carried by a user.

4. The robot of claim 3, wherein the transportation mode represents that the robot is moving with a user who is walking, running, or climbing stairs.

5. The robot of claim 1, wherein the predicted transportation mode represents that the robot is being moved by a particular apparatus.

6. The robot of claim 5, wherein the predicted transportation mode represents that the robot is being moved by a bicycle, a car, a bus, a boat, a train, or an aircraft.

7. The robot of claim 1, wherein in response to a particular stimulus, the robot is configured to generate and perform a first behavior for a first predicted transportation mode and to generate and perform a different second behavior for a different second predicted transportation mode.

8. The robot claim 1, wherein the transportation mode classifier is trained using training data generated from logs generated by integrated inertial measurement units of one or more robots being transported in each of a plurality of different transportation modes.

9. The robot of claim 1, wherein generating the predicted transportation mode of the robot comprises:

computing a plurality of feature values for each of a plurality of transportation mode features; and

providing the plurality of feature values as input to the transportation mode classifier.

10. The robot of claim 9, wherein the transportation mode features include one or more of a mean accelerometer magnitude, a standard deviation of the accelerometer magnitude, a minimum accelerometer magnitude over the sequence duration, a maximum accelerometer magnitude over the sequence duration, an autocorrelation of the accelerometer magnitude, a mean gyroscope magnitude, a standard deviation of the gyroscope magnitude, a minimum gyroscope magnitude over the sequence duration, a maximum gyroscope magnitude over the sequence duration, or an autocorrelation of the gyroscope magnitude.

11. A method performed by a robot, the method comprising:

receiving a plurality of sensor inputs from one or more sensor subsystems of the robot, wherein the sensor subsystems are configured to generate sensor inputs to be processed by the robot;

generating, by a transportation mode classifier installed on the robot from the plurality of sensor inputs, a predicted transportation mode of the robot;

providing the predicted transportation mode to an emotion state engine that is configured to maintain an internal emotion state of the robot;

updating, by the emotion state engine, the internal emotion state of the robot based on the predicted transportation mode of the robot;

providing the updated emotion state of the robot to a behavior selection engine; generating, by the behavior selection engine, a behavior for the robot to perform based on the updated emotion state; and

performing, by the robot, the behavior generated by the behavior selection engine for the predicted transportation mode.

12. The method of claim 11, wherein the predicted transportation mode represents that the robot is in motion due to forces of an external entity and identifies one of an enumerated class of circumstances under which the motion occurs.

13. The method of claim 11, wherein the predicted transportation mode represents that the robot is being moved in a particular way while being held or carried by a user.

14. The method of claim 13, wherein the transportation mode represents that the robot is moving with a user who is walking, running, or climbing stairs.

15. The method of claim 11, wherein the predicted transportation mode represents that the robot is being moved by a particular apparatus.

16. The method of claim 15, wherein the predicted transportation mode represents that the robot is being moved by a bicycle, a car, a bus, a boat, a train, or an aircraft.

17. The method of claim 11, further comprising in response to a particular stimulus, generating and performing a first behavior for a first predicted transportation mode and generating and performing a different second behavior for a different second predicted transportation mode.

18. The method claim 11, wherein the transportation mode classifier is trained using training data generated from logs generated by integrated inertial measurement units of one or more robots being transported in each of a plurality of different transportation modes.

19. The method of claim 11, wherein generating the predicted transportation mode of the robot comprises:

computing a plurality of feature values for each of a plurality of transportation mode features; and

providing the plurality of feature values as input to the transportation mode classifier.

20. The robot of claim 19, wherein the transportation mode features include one or more of a mean accelerometer magnitude, a standard deviation of the accelerometer magnitude, a minimum accelerometer magnitude over the sequence duration, a maximum accelerometer magnitude over the sequence duration, an autocorrelation of the accelerometer magnitude, a mean gyroscope magnitude, a standard deviation of the gyroscope magnitude, a minimum gyroscope magnitude over the sequence duration, a maximum gyroscope magnitude over the sequence duration, or an autocorrelation of the gyroscope magnitude.

21. One or more non-transitory computer storage media encoded with instructions that, when executed by one or more processors of a robot, cause the robot to perform operations comprising:

receiving a plurality of sensor inputs from one or more sensor subsystems of the robot, wherein the sensor subsystems are configured to generate sensor inputs to be processed by the robot;

generating, by a transportation mode classifier installed on the robot from the plurality of sensor inputs, a predicted transportation mode of the robot; providing the predicted transportation mode to an emotion state engine that is configured to maintain an internal emotion state of the robot;

updating, by the emotion state engine, the internal emotion state of the robot based on the predicted transportation mode of the robot;

providing the updated emotion state of the robot to a behavior selection engine; generating, by the behavior selection engine, a behavior for the robot to perform based on the updated emotion state; and

performing, by the robot, the behavior generated by the behavior selection engine for the predicted transportation mode.

Description:
ROBOT TRANSPORTATION MODE CLASSIFICATION

BACKGROUND

This specification relates to robots, and more particularly to robots used for consumer purposes.

Consumer robots have been developed that can express a variety of human-like emotions. One such robot can express a chosen emotion by raising and lowering an actuated lift, moving and transforming the shape of its eyes on an OLED screen, orienting or translating its pose using continuous tracks (tank treads), and projecting noises or a voice through a speaker. The robot is programmed with a set of“animations,” each of which controls these components in parallel to execute a sequence of actions. The robot can also plan and execute its own actions, such as driving to a new location.

The choice of action or animation is informed by the current robot behavior. For instance, if the robot behavior is to laugh, the robot might play one of several laughing animations. The choice of behavior, in turn, is dictated by a combination of an internal state, external stimuli, and a behavior selection algorithm. In this robot and in others, this process of fusing sensor data with an internal state to choose a behavior does not fully take into account the context of the use of the robot. As a result, the emotions conveyed by the robot through animations are the same no matter how the robot is being transported.

In the field of smartphones, a variety of algorithms have been proposed for estimating the transportation mode of a smartphone device from embedded sensor data. The approaches typically involve the collection of training data from sensors such as a GPS receiver, accelerometer, and gyroscope. Then, ad-hoc algorithms or machine learning techniques are used to build classifiers that can predict the transportation mode from these sensors.

Some existing robots, including robots designed to express emotion, run processes that leverage the robot’s internal state, external stimuli, and a selection mechanism to choose the robot behavior or actions. The mechanism is used to drive an emotion state mechanism and select actions for the robot to execute. The mechanism is driven by external and internal sensors, such as a touch sensor, CCD camera, and microphone. The“conditions” identified by this existing technology are limited to a those that immediately identifiable by the robot sensors, such as petting, the existence of objects in the field of view of the camera, or verbal commands detected with the microphone. SUMMARY

This specification describes how a robot can select actions, animations, speech, and/or behaviors to perform based on a transportation mode. The transportation mode, other sensor data, the internal state of the robot, and the selection algorithm in concert choose the behavior of the robot and what actions or animations to perform. As an example, if a robot is in a car, and the internal state of the robot is“happy,” its behavior selection engine may trigger a sequence of animations that convey“glee.”

Previous autonomous emotion robots do not take into account the transportation mode when selecting animations, actions, or behaviors for the robot to execute. The sensor input in prior systems is used only to detect the robot’s state of interaction, e.g., whether the robot is being touched. The techniques described below broaden the use of the sensors to allow for the inclusion of the general context of the robot’s use in the selection of actions, animations, and behaviors. Rather than playing an animation if the robot is touched, for instance, it can play an animation if the robot is touched while riding in a car

Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages. By allowing for the transportation mode to affect the evolution of the emotion state engine and the behavior selection engine, the emotion state of the robot, and the process by which it selects behaviors and animations, are richer and more complex. The robot is truly aware of its context of use. Whereas a robot that is agnostic to the transportation mode might play the same animation to react to a stimulus whether riding in a car or stationary in the living room, a robot that takes the transportation mode into account can select an animation or behavior that is more appropriate to a given situation. This allows for a more diverse set of animations or actions, including ones that are only used in the specific context of a particular transportation mode. As a result, the overall behavior of the robot can seem more intelligent or emotional to an observer. This, in turn, makes the robot easier to understand, easier to use, and increases user engagement.

The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims. BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 A illustrates an example robot.

FIG. 1B is a block diagram illustrating an input and output system of an example robot.

FIG. 2 is a block diagram showing an example software architecture of a robot. FIG. 3 is a flowchart of an example process for training a transportation mode classifier for a robot.

FIG. 4 is a flowchart of an example process for generating a behavior for a robot based on a predicted transportation mode.

FIGS. 5A-5C illustrate different robot behaviors generated as a result of the robot detecting different transportation modes.

Like reference numbers and designations in the various drawings indicate like elements.

DETAILED DESCRIPTION

This specification describes an example robotic platform having a variety of sensors and a controller that decides and directs how to execute actions using actuators, displays, and/or speakers or other components.

FIG. 1 A illustrates an example robot 100A. The robot 100A is an example of a mobile autonomous robotic system on which the transportation mode classification techniques described in this specification can be implemented. The robot 100A can use the techniques described below for use as a toy or as a personal companion. As an example, robots that having components that are suitable for classifying transportation modes are described in more detail in commonly-owned ET.S. Patent Application No. 15/694,710, which is herein incorporated by reference.

The robot 100 A generally includes a body 105 and a number of physically moveable components. The components of the robot 100A can house data processing hardware and control hardware of the robot. The physically moveable components of the robot 100 A include a propulsion system 110, a lift 120, and a head 130.

The robot 100A also includes integrated output and input subsystems.

The output subsystems can include control subsystems that cause physical movements of robotic components; presentation subsystems that present visual or audio information, e.g., screen displays, lights, and speakers; and communication subsystems that communicate information across one or more communications networks, to name just a few examples.

The control subsystems of the robot 100A include a locomotion subsystem 110. In this example, the locomotion system 110 has wheels and treads. Each wheel subsystem can be independently operated, which allows the robot to spin and perform smooth arcing maneuvers. In some implementations, the locomotion subsystem includes sensors that provide feedback representing how quickly one or more of the wheels are turning. The robot can use this information to control its position and speed.

The control subsystems of the robot 100A include an effector subsystem 120 that is operable to manipulate objects in the robot’s environment. In this example, the effector subsystem 120 includes a lift and one or more motors for controlling the lift. The effector subsystem 120 can be used to lift and manipulate objects in the robot’s environment. The effector subsystem 120 can also be used as an input subsystem, which is described in more detail below.

The control subsystems of the robot 100A also include a robot head 130, which has the ability to tilt up and down and optionally side to side. On the robot 100A, the tilt of the head 130 also directly affects the angle of a camera 150.

The presentation subsystems of the robot 100A include one or more electronic displays, e.g., electronic display 140, which can each be a color or a monochrome display or both. The electronic display 140 can be used to display any appropriate information. In FIG. 1, the electronic display 140 is presenting a simulated pair of eyes. The presentation subsystems of the robot 100A also include one or more lights 142 that can each turn on and off, optionally in multiple different colors.

The presentation subsystems of the robot 100A can also include one or more speakers, which can play one or more sounds in sequence or concurrently so that the sounds are at least partially overlapping.

The input subsystems of the robot 100A include one or more perception subsystems, one or more audio subsystems, one or more touch detection subsystems, one or more motion detection subsystems, one or more effector input subsystems, and one or more accessory input subsystems, to name just a few examples. The perception subsystems of the robot 100A are configured to sense light from an environment of the robot. The perception subsystems can include a visible spectrum camera, an infrared camera, or a distance sensor, to name just a few examples. For example, the robot 100A includes an integrated camera 150. The perception subsystems of the robot 100A can include one or more distance sensors. Each distance sensor generates an estimated distance to the nearest object in front of the sensor.

The perception subsystems of the robot 100A can include one or more light sensors. The light sensors are simpler electronically than cameras and generate a signal when a sufficient amount of light is detected. In some implementations, light sensors can be combined with light sources to implement integrated cliff detectors on the bottom of the robot. When light generated by a light source is no longer reflected back into the light sensor, the robot 100A can interpret this state as being over the edge of a table or another surface.

The audio subsystems of the robot 100A are configured to capture from the environment of the robot. For example, the robot 100 A can include a directional microphone subsystem having one or more microphones. The directional microphone subsystem also includes post-processing functionality that generates a direction, a direction probability distribution, location, or location probability distribution in a particular coordinate system in response to receiving a sound. Each generated direction represents a most likely direction from which the sound originated. The directional microphone subsystem can use various conventional beam-forming algorithms to generate the directions.

The touch detection subsystems of the robot 100A are configured to determine when the robot is being touched or touched in particular ways. The touch detection subsystems can include touch sensors, and each touch sensor can indicate when the robot is being touched by a user, e.g., by measuring changes in capacitance. The robot can include touch sensors on dedicated portions of the robot’s body, e.g., on the top, on the bottom, or both. Multiple touch sensors can also be configured to detect different touch gestures or modes, e.g., a stroke, tap, rotation, or grasp.

The motion detection subsystems of the robot 100A are configured to measure the movement of the robot. The motion detection subsystems can include motion sensors and each motion sensor can indicate that the robot is moving in a particular way. For example, a gyroscope sensor can indicate a relative orientation of the robot. As another example, an accelerometer can indicate a direction and a magnitude of an acceleration, e.g., of the Earth’s gravitational field.

The effector input subsystems of the robot 100A are configured to determine when a user is physically manipulating components of the robot 100A. For example, a user can physically manipulate the lift of the effector subsystem 120, which can result in an effector input subsystem generating an input signal for the robot 100A. As another example, the effector subsystem 120 can detect whether or not the lift is currently supporting the weight of any objects. The result of such a determination can also result in an input signal for the robot 100 A.

The robot 100A can also use inputs received from one or more integrated input subsystems. The integrated input subsystems can indicate discrete user actions with the robot 100A. For example, the integrated input subsystems can indicate when the robot is being charged, when the robot has been docked in a docking station, and when a user has pushed buttons on the robot, to name just a few examples.

The robot 100A can also use inputs received from one or more accessory input subsystems that are configured to communicate with the robot 100A. For example, the robot 100 A can interact with one or more cubes that are configured with electronics that allow the cubes to communicate with the robot 100A wirelessly. Such accessories that are configured to communicate with the robot can have embedded sensors whose outputs can be

communicated to the robot 100 A either directly or over a network connection. For example, a cube can be configured with a motion sensor and can communicate an indication that a user is shaking the cube.

The robot 100A can also use inputs received from one or more environmental sensors that each indicate a particular property of the environment of the robot. Example

environmental sensors include temperature sensors and humidity sensors to name just a few examples.

One or more of the input subsystems described above may also be referred to as “sensor subsystems.” The sensor subsystems allow a robot to determine when a user is interacting with the robot, e.g., for the purposes of providing user input, using a

representation of the environment rather than through explicit electronic commands, e.g., commands generated and sent to the robot by a smartphone application. The representations generated by the sensor subsystems may be referred to as“sensor inputs.”

The robot 100A also includes computing subsystems having data processing hardware, computer-readable media, and networking hardware. Each of these components can serve to provide the functionality of a portion or all of the input and output subsystems described above or as additional input and output subsystems of the robot 100A, as the situation or application requires. For example, one or more integrated data processing apparatus can execute computer program instructions stored on computer-readable media in order to provide some of the functionality described above.

The robot 100A can also be configured to communicate with a cloud-based computing system having one or more computers in one or more locations. The cloud-based computing system can provide online support services for the robot. For example, the robot can offload portions of some of the operations described in this specification to the cloud- based system, e.g., for determining behaviors, computing signals, and performing natural language processing of audio streams.

FIG. 1B is a block diagram illustrating an input and output system 100B of an example robot, e.g., the example robot 100A. The system 100B includes a controller 160 that receives inputs from sensors 172. In some implementations, the sensors can help inform the detection of the transportation mode. For example, an accelerometer 162 and a gyroscope 168 can indicate accelerations and rotations that describe the motion of the robot; a GPS receiver 164 can reveal over time the speed and absolute direction of motion, and a microphone 166 can provide a spectrum of the audible vibrations caused by the hum of a vehicle. The robot sensors are thus selected based on the transportation modes among which the robot should be capable of differentiating.

The controller 160 can process the inputs to generate outputs directed to one or more output devices. For example, the controller 160 can be coupled to output systems including actuators l74a-l74n, an electronic display 176, and a speaker 178.FIG. 2 is a block diagram showing an example software architecture 200 of an example robot. The software architecture 200 includes a sensor subsystem 204, a transportation mode classifier 202 (TMC), an emotion state engine 206, a behavior selection engine 208, an animation and action selection engine 210, and an animation and action generation engine 212. The TMC 202 is a subsystem that analyzes the current and/or historical sensor data to detect the transportation mode of the robot. In this specification, a transportation mode is a class of circumstances under which the robot is in transit due to the influence of external forces. In this specification, external forces are forces generated by entities that are not a part of the robot’s integrated subsystems. Each transportation mode thus represents that the robot’s location is changing, e.g., that the robot is in transit, due to forces of an external entity. Each transportation mode also classifies this movement as happening in accordance with one of an enumerated set of circumstances.

For example, a transportation mode can represent that the robot is being moved in a particular way while being held, carried, or moved by a user. The transportation mode can thus represent that the robot is moving with a user who is walking, running, or climbing stairs, to name just a few examples. The transportation mode can also represent the robot being moved by a user, e.g., being pushed or pulled.

A transportation mode can also represent that the robot is being moved by a particular apparatus, which may or may not involve being held or carried by a user. The transportation mode can thus also represent that the robot is being moved by a bicycle, a car, a bus, a boat, a train, or an aircraft, to name just a few examples. The transportation mode can reflect and be related to human locomotion, e.g., when the robot is being carried by a user who is walking; or be independent of human locomotion, e.g., when the robot is transported in a car.

The TMC 202 is developed using data previously collected by the robot. Else of such “training data” ensures that the TMC 202 is tailored to the given sensor configuration of the robot. Furthermore, by collecting these data using multiple robot units, the algorithm is more robust to variation among robot units.

The output of the TMC 202 may include a level of certainty or confidence in the detected transportation mode, or the TMC 202 may produce a list of all possible

transportation modes coupled with the certainty or confidence in each result.

Furthermore, the TMC 202 may evolve over time based on the frequency of detected transportation modes or other factors. For instance, a robot that is taken for a car ride at the same time every day may use that information to simplify the subsequent detection of the current transportation mode at that time. The robot can also maintains an emotion state engine 206, which is a subsystem that maintains an internal state in way that is informed by current, recent, and/or long-term external stimuli. The emotion state engine 206 can maintain the current general“mood” of the robot, along with a list of emotion states and the respective intensities of those emotion states. The sensor data received from the sensor subsystems 204 is used as input to the emotion state engine, and a programmed model of how the data affects the emotion state engine 206 calculates how the emotion states and“mood” should be updated.

The emotion state for a robot can be a single-dimensional or a multi-dimensional data structure, e.g., a vector or an array, that maintains respective values for each of one or more different aspects. Each aspect can represent an enumerated value or a particular value on a simulated emotional spectrum, with each value for each aspect representing a location within that simulated emotional spectrum. For example, an example emotion state can have the following values: Happy, Calm, Brave, Confident, Excited, and Social, each of which may have a negative counterpart.

The emotion states need not correspond to specifically identifiable human emotions. Rather, the emotion state can also represent other, more general or more specific spectrums that characterize robot behavior. For example, the emotion state can be a Social state that represents how eager the robot is to interact with users generally, a Want-To-Play state that represents how eager the robot is to engage in gameplay with a user, and a Winning state that represents how competitive the robot is in games. The emotion state can also correspond to a desired or impending change in external circumstances, such as changing the state of the user or an object, e.g., Want-to-F lighten or Want-to- Soothe. Emotion states can also correspond to current physical states of the robot, such as Needs-Repair and Hungry. Such states can manifest in the same sort of character and motion constraints as other emotion states. The emotion states can enhance user engagement with the robot and can improve the interface between users and the robot by making the robot’s actions and responses readily understandable.

The transportation mode generated by the TMC 202 is also an input to the emotion state engine 206. The robot emotion state is updated based on a model of how the current transportation mode, or changes in the transportation mode, affect the emotion state. The transportation mode input is also fused with current sensor data, allowing for complex robot emotional responses that reflect the context of use. For instance, a robot that experiences a large jolt while stationary may have increased“anxiety,” while a similar jolt when the robot is in a car may be considered par for the course, and it would not change the“anxiety” level of the emotion state.

Furthermore, the model underlying the emotion state engine 206 can also take into account one or more distinct robot“personality” types, which further helps to determine how the emotion state should change for a given transportation mode. For instance, the emotion state engine 206 of a“fearful” robot who is taken for a car ride may update its state with increased“anxiety.” A“carefree” robot, on the other hand, may increase the intensity of its “glee” emotion state when experiencing a car ride.

The behavior selection engine 206 takes as input the current sensor data from the sensor subsystems 204, the emotion state from the emotion state engine 206, and the output of the TMC 202, and from those inputs, the behavior selection engine 208 chooses a high- level behavior for the robot to perform. Each behavior involves performing one or more actions. The inclusion of the transportation mode alongside sensor data and the emotion state allows for the selection or exclusion of certain behaviors based on the transportation mode. For instance, if the microphone 166 recognizes the verbal command“do the dishes,” but the robot detects that the robot is riding in a car, the selected behavior may express confusion. The three inputs can also be weighted, so that one source of input can have a stronger influence on the selected behavior than others, and this weighting can change over time. Furthermore, the inputs can be weighted based on their value, so that, for instance, if the TMC 202 detects a car ride, the TMC 202 is weighted more heavily than the emotion state engine input in determining the robot behavior. These weightings can also change over time. The weightings can also vary based on the confidence of the measurement. For instance, if the TMC algorithm has not completed or has returned a result or results with low confidence, the emotion state or current sensor data can have a greater influence on the selected behavior.

The behavior selection engine 208 can also choose its behavior based on changes in the detected transportation mode. For example, the robot may trigger a behavior that expresses“excitement” when the transportation mode transitions from stationary to driving, or from walking to running. Furthermore, the behavior selection may evolve over time based on the regularity of the transportation mode or other factors. For instance, the selected behavior when a robot is first taken for a car ride could differ from the selected behavior after the robot rides in a car regularly.

Any unexpected transportation modes can also trigger specific behaviors. For example, if the robot expects that the robot should remain on the ground in a given area, and the robot instead detects that it is being carried for a walk or is in a car, that irregularity could trigger a specific behavior, as well as a new state of "alert" in the emotion state engine.

Once the behavior selection engine 208 has selected a high-level behavior based on knowledge of the transportation mode, that behavior is used by the animation and action selection engine 210 to queue a list of actions to be performed by various components of the robot. The actions can include physical actions that cause the actuation of physically moveable components, display actions that cause the presentation of visual or audio data through electronic displays or speakers. These actions may include“animations” designed to convey a particular emotional response, or more basic actions like“drive forward.” These animations may be distinct to a particular transportation mode. For instance, a“riding in the car” animation might exist, but the“riding in the car” animation may be an animation that is only executed when the TMC 202 determines that the robot is riding in a car.

Finally, the animation and action generation engine 212 can translate the selected actions into commands for actuators, displays, speakers, and/or other components of the robot.

The output of the animation & action generation engine 212 is also used by the TMC 202 to improve the quality of the estimated transportation mode. Because the robot may be moving under its own accord, the TMC 202 can modify its input sensor data to account for the known motion of the robot. For instance, if the action generation engine instructs the robot to accelerate forward, the TMC 202 can ignore a corresponding acceleration in that direction, so that it considers sensor input only from unbiased external forces.

FIG. 3 is a flowchart of an example process for training a transportation mode classifier for a robot. The example process will be described as being performed by a system of one or more computers, located in one or more locations, and programmed appropriately in accordance with this specification. For example, the example process can be performed by a system of computers in a datacenter programmed to build a transportation mode classifier for a robot.

The system collects training data in different transportation modes (310). For example, the system can collect training data from a single robot or from multiple robot units. The training data include features indicating a robot’s status such as registered values for the accelerometer, the GPS receiver, and the gyroscope, to name just a few examples. Each training example can be labeled with a mode of transportation used to generate the features.

For example, the following section outlines an example technique for training a transportation mode classifications for a robots that each have n integrated an inertial measurement unit (IMU) having a 3-axis gyroscope and a 3-axis accelerometer.

Each robot is programmed to record IMEG data to a formatted log file at 200 Hz. Multiple units of the robot are produced and given to multiple individuals. Each individual is instructed to perform each of the following actions, during which time logs of IMU data are being recorded:

i) Allow the robot to remain stationary on a flat surface for at least two hours; ii) Place the robot somewhere inside the passenger compartment of a vehicle, and then drive in that vehicle for at least two hours;

iii) Place the robot on the ground of a train, and then travel in the train for at least two hours; and

iv) Either hold the robot or place it in a backpack or pocket, and then take a walk with the robot for at least two hours.

The IMU logs are then obtained from the robots. For each log, the 3-axis gyroscope and 3-axis accelerometer logs are each replicated as down-sampled versions, with

frequencies f = 1 Hz, 2 Hz, 4 Hz, 6 Hz, 8 Hz, and 10 Hz. The logs are then segmented into sequences of length T = 5 s, 10 s, 15 s, 30 s, 45 s, 60 s, 120 s, 240 s, and 360 s, forming a collection of 48 ( T ) pairs for each sensor, each containing the same data but segmented and down-sampled in different ways. The best pair is later to be chosen as part of the model selection. Statistics of each sequence are then calculated. These include the mean accelerometer magnitude, the standard deviation of the accelerometer magnitude, the minimum accelerometer magnitude over the sequence duration, the maximum accelerometer magnitude over the sequence duration, the autocorrelation (at lag 1) of the accelerometer magnitude, the mean gyroscope magnitude, the standard deviation of the gyroscope magnitude, the minimum gyroscope magnitude over the sequence duration, the maximum gyroscope magnitude over the sequence duration, and the autocorrelation (at lag 1) of the gyroscope magnitude. Statistics of the sequences in the frequency domain can also be calculated. A table is constructed that lists the transportation mode of a sequence along with its set of statistics, so that the statistics, or feature vector, of each sequence is labeled by its transportation mode.

The system uses the collected training data to train a model to predict a current transportation mode (520). The system can select the model type from a number of different models. Model selection can involve creating several machine learning classifiers that can each predict the transportation mode from a feature vector, and then the classifier that performs most effectively is identified. Techniques for supervised learning are selected, including a Random Forest, Support Vector Machine (SVM), a Naive Bayes classifier, and AdaBoost.

The system can use cross-validation to test the accuracy of each model. The data can be randomly partitioned into ten equal groups of labeled feature vectors in order to perform k-fold cross-validation, as follows. One group of the partition is withheld, and then each classifier is trained on the remaining nine groups of labeled features. The withheld group then serves as a way to test the accuracy of the trained classifiers. A single trained classifier predicts the transportation mode for each feature vector in the test data (i.e., the withheld group), which produces an accuracy score formed by dividing the number of correct predictions of transportation mode by the number of sequences in the test data. This score is calculated for each classifier. Then, a different group of feature vectors is withheld as test data, and the remaining nine are used to re-train the classifiers from scratch. After repeating the process of withholding a portion of the data, training the classifiers on the remaining groups, and then testing them on the withheld group, each classifier will have ten accuracy scores, and the classifier with the highest average accuracy score can be selected for further use.

The model selection process can be repeated for each of the 48 ( T ) pairs to find the model parameters with the highest accuracy. The final result is a single model trained on the entire set of labeled feature vectors that correspond to a single pair (/ T). On some robots, a Random Forest trained on sequences with the properties T = 60 s and f =10 Hz has the highest accuracy score. If desired, the model selection process can be repeated using all possible subsets, e.g., the power set, of the calculated statistics.

The system provides a trained model to predict a current transportation to a robot (330). The trained model with the highest accuracy can be programmed on the robot. The robot can also be programmed to translate instantaneous IMU readings into sequences of length T and frequency/ chosen through the above process, and to compute the statistics of the sequences, so that a feature vector is generated from recent IMU data on a regular interval. Each time a new sequence is available, the trained model predicts the transportation mode from the sequence’s feature vector. That prediction, and any measure of confidence, can then be used as input to the emotion state engine and the behavior selection engine. It may be advantageous to do these calculations synchronously or asynchronously.

FIG. 4 is a flowchart of an example process for generating a behavior for a robot based on a predicted transportation mode. The example process can be performed by any appropriate system having a transportation mode classifier. For convenience, the example process will be described as being performed a robot having an emotion state engine and configured with a transportation mode classifier.

The robot receives a plurality of sensor inputs (410). For example, the system can receive sensor inputs from the sensor subsystem 204 as described above with reference to FIGS. 1-2.

The robot generates a predicted transportation mode based on the received sensor inputs (420). For example, the system can use a transportation mode classifier to predict a transportation mode of the robot as explained above with reference to FIG. 2

The robot provides the predicted transportation mode to the emotion state engine (430). As described above, the emotion state engine is configured to maintain a single or multi-dimensional data structure that represents a state of the robot in one or more different simulated aspects.

The robot updates an internal emotion state based on the predicted transportation mode of the robot (440). For instance, a robot that experiences a large jolt while stationary may results in an increased value for an“anxiety” aspect of the emotion state while a similar jolt when the robot is in a car may be considered normal and does not change the“anxiety” aspect of the emotion state.

The robot provides the updated emotion state to a behavior selection engine (450).

For instance, if the microphone 166 recognizes the verbal command“do the dishes,” but the robot detects that the robot is riding in a car, the system can increase the“confusion” aspect of the emotion state and provide the updated emotion state to the behavior selection engine.

The robot generates a behavior for the robot to perform based on the updated emotion state (460). For example, the system can cause the animation and action selection engine 210 to select a behavior that expresses excitement when the transportation mode transitions from stationary to driving, or from walking to running.

The robot performs the behavior (470). For example, the selected behavior from the animation and action selection engine 210 can be passed to the animation and action generation engine 212 so that the selected actions are translated into commands for actuators, displays, speakers, and/or other components of the robot.

FIGS. 5A-5C illustrate different robot behaviors generated as a result of the robot detecting different transportation modes.

For example in FIG. 5A, a robot is being moved by a bicycle operated by a user. The robot is programmed to simulate enjoying bicycle rides. Therefore, the emotion state engine of the robot can update its emotion state when the transportation classifier determines that the robot is riding on a bicycle. As a result, a behavior engine of the robot can select a happy face 510 to output to the display of the robot.

In FIGS. 5B and 5C, the robot is being moved by a vehicle driven by a user. The robot has exhibited different animations and actions. For example, in FIG. 5B, the robot has produced an audio output 520 to indicate an internal emotional state of excitement. In FIG. 5C, the robot has exhibited an emotional state that reflects nervousness by providing an audio output 530,“WOAH! Going a little fast there buddy!” that reminds the user to use caution while driving.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine- readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.

The term“data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

A computer program which may also be referred to or described as a program, software, a software application, an app, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.

As used in this specification, an“engine,” or“software engine,” refers to a software implemented input/output system that provides an output that is different from the input. An engine can be an encoded block of functionality, such as a library, a platform, a software development kit (“SDK”), or an object. Each engine can be implemented on any appropriate type of computing device, e.g., servers, mobile phones, tablet computers, notebook computers, music players, e-book readers, laptop or desktop computers, PDAs, smart phones, or other stationary or portable devices, that includes one or more processors and computer readable media. Additionally, two or more of the engines may be implemented on the same computing device, or on different computing devices.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.

Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. The central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.

To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and pointing device, e.g, a mouse, trackball, or a presence sensitive display or other surface by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user’s device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone, running a messaging application, and receiving responsive messages from the user in return.

Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface, a web browser, or an app through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front- end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.

The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the device, which acts as a client. Data generated at the user device, e.g., a result of the user interaction, can be received at the server from the device.

In addition to the embodiments described above, the following embodiments are also innovative:

Embodiment l is a robot comprising:

a body and one or more physically moveable components;

one or more sensor subsystems that are configured to generate sensor inputs;

one or more processors; and

one or more storage devices storing instructions that are operable, when executed by the one or more processors, to cause the robot to perform operations comprising:

receiving a plurality of sensor inputs from the one or more sensor subsystems; generating, by a transportation mode classifier from the plurality of sensor inputs, a predicted transportation mode of the robot;

providing the predicted transportation mode to an emotion state engine that is configured to maintain an internal emotion state of the robot;

updating, by the emotion state engine, the internal emotion state of the robot based on the predicted transportation mode of the robot;

providing the updated emotion state of the robot to a behavior selection engine;

generating, by the behavior selection engine, a behavior for the robot to perform based on the updated emotion state; and

performing, by the robot, the behavior generated by the behavior selection engine for the predicted transportation mode. Embodiment 2 is the robot of embodiment 1, wherein the predicted transportation mode represents that the robot is in motion due to the forces of an external entity and identifies one of an enumerated class of circumstances under which the motion occurs.

Embodiment 3 is the robot of any one of embodiments 1-2, wherein the predicted transportation mode represents that the robot is being moved in a particular way while being held or carried by a user.

Embodiment 4 is the robot of embodiment 3, wherein the transportation mode represents that the robot is moving with a user who is walking, running, or climbing stairs.

Embodiment 5 is the robot of any one of embodiments 1-4, wherein the predicted transportation mode represents that the robot is being moved by a particular apparatus.

Embodiment 6 is the robot of embodiment 5, wherein the predicted transportation mode represents that the robot is being moved by a bicycle, a car, a bus, a boat, a train, or an aircraft.

Embodiment 7 is the robot of any one of embodiments 1-6, wherein in response to a particular stimulus, the robot is configured to generate and perform a first behavior for a first predicted transportation mode and to generate and perform a different second behavior for a different second predicted transportation mode.

Embodiment 8 is the robot of any one of embodiments 1-7, wherein the transportation mode classifier is trained using training data generated from logs generated by integrated inertial measurement units of one or more robots being transported in each of a plurality of different transportation modes.

Embodiment 9 is the robot of any one of embodiments 1-8, wherein generating the predicted transportation mode of the robot comprises:

computing a plurality of feature values for each of a plurality of transportation mode features; and

providing the plurality of feature values as input to the transportation mode classifier.

Embodiment 10 is the robot of embodiment 9, wherein the transportation mode features include one or more of a mean accelerometer magnitude, a standard deviation of the accelerometer magnitude, a minimum accelerometer magnitude over the sequence duration, a maximum accelerometer magnitude over the sequence duration, an autocorrelation of the accelerometer magnitude, a mean gyroscope magnitude, a standard deviation of the gyroscope magnitude, a minimum gyroscope magnitude over the sequence duration, a maximum gyroscope magnitude over the sequence duration, or an autocorrelation of the gyroscope magnitude.

Embodiment 11 is a method comprising performing, by a robot, the operations of any one of claims 1-10.

Embodiment 12 is one or more computer storage media encoded with computer program instructions that when executed by a robot causes the robot to perform the operations of any one of claims 1-10.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially be claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Particular embodiments of the subject matter have been described. Other

embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain some cases, multitasking and parallel processing may be advantageous.

What is claimed is: