Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SMART DIALOGUE SYSTEM AND METHOD OF INTEGRATING ENRICHED SEMANTICS FROM PERSONAL AND CONTEXTUAL LEARNING
Document Type and Number:
WIPO Patent Application WO/2019/214799
Kind Code:
A1
Abstract:
A smart dialogue system improved with a context and machine learning system for enabling enhanced semantics for query understanding. A spoken dialogue system interacts with a user via voice. A context and machine learning system is configured to collect telemetry and interaction data from a user and a vehicle and learns a user profile comprising at least one of user's preference, interest, or habit based on the collected data. The context and machine learning system is configured to identify a behavior pattern of the user from the user profile, identify a current context associated with a current situation, determine a recommended action based on the behavior pattern and the current context, and send the recommended action to a target system for executing the recommended action. The recommended action may be generated in response to a voice request from a user or may be generated autonomously by the system without a request from the user.

Inventors:
CHIN ALVIN (US)
TIAN JILEI (US)
Application Number:
PCT/EP2018/061667
Publication Date:
November 14, 2019
Filing Date:
May 07, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BAYERISCHE MOTOREN WERKE AG (DE)
International Classes:
B60R16/037; G10L15/22; G10L15/18; G10L15/30
Foreign References:
US20160098992A12016-04-07
US20160163319A12016-06-09
US20140067201A12014-03-06
US20140108307A12014-04-17
US20140136214A12014-05-15
US20140142948A12014-05-22
US20150178034A12015-06-25
Other References:
None
Download PDF:
Claims:
Claims

What is claimed is:

1. A smart dialogue system comprising:

a spoken dialogue system configured to interact with a user via voice; and

a context and machine learning system configured to collect contextual information from a user and a vehicle, generate a user profile comprising at least one of user’s preference, interest, or habit based on the contextual information, identify a behavior rule of the user from the user profile, identify a current context associated with a current situation, determine a recommended action based on the behavior rule and the current context, and send the recommended action to a target system for executing the recommended action.

2. The smart dialogue system of claim 1 , wherein the spoken dialogue system is configured to send a confirmation request to the user for the recommended action and receive a confirmation from the user regarding the recommended action, wherein the recommended action is sent if the confirmation is received from the user.

3. The smart dialogue system of claim 1 , wherein the spoken dialogue system is configured to receive a voice request from the user, convert the voice request to a text query, parse the text query to identify semantics of the voice request, and send the semantics of the voice request to the context and machine learning system, and the context and machine learning system is configured to determine the recommended action based on the semantics of the voice request.

4. The smart dialogue system of claim 1, wherein the target system is remote and the recommended action is sent via a network.

5. The smart dialogue system of claim 1, wherein the user profile is generated by machine learning on data collected from the user and the vehicle.

6. The smart dialogue system of claim 1, wherein the behavior rule comprises an action and a condition for the action.

7. The smart dialogue system of claim 1, wherein the contextual information includes at least one of phone data, vehicle data, user voice data, or user habit data.

8. A method of integrating enriched semantics from personal and contextual learning in a smart dialogue system, comprising:

collecting contextual information from a user and a vehicle;

generating a user profile comprising at least one of user’ s preference, interest, or habit based on the contextual information;

identifying a behavior rule of the user from the user profile;

identifying a current context associated with a current situation;

determining a recommended action based on the behavior rule and the current context; and

sending the recommended action to a target system for executing the recommended action.

9. The method of claim 8, further comprising:

sending a voice response to the user for the recommended action; and

receiving a confirmation from the user regarding the recommended action,

wherein the recommended action is sent if the confirmation is received from the user.

10. The method of claim 8, further comprising:

receiving a voice request from the user;

converting the voice request to a text query; and

parsing the text query to identify semantics of the voice request,

wherein the recommended action is determined based on the semantics of the voice request.

11. The method of claim 8, wherein the target system is remote and the recommended action is sent via a network.

12. The method of claim 8, wherein the user profile is generated by machine learning on data collected from the user and the vehicle.

13. The method of claim 8, wherein the behavior rule comprises an action and a condition for the action.

14. The method of claim 8, wherein the contextual information includes at least one of phone data, vehicle data, user voice data, or user habit data.

15. The method of claim 8, wherein the smart dialogue system is implemented in a mobile phone.

16. A device for integrating enriched semantics from personal and contextual learning in a smart dialogue system, comprising:

a memory for storing codes;

a processor for executing the codes, wherein the codes, if executed, are configured to: collect contextual information from a user and a vehicle;

generate a user profile comprising at least one of user’s preference, interest, or habit based on the contextual information;

identify a behavior rule of the user from the user profile;

identify a current context associated with a current situation;

determine a recommended action based on the behavior rule and the current context; and

send the recommended action to a target system for executing the recommended action.

17. The device of claim 16, wherein the codes, if executed, are configured to:

send a voice response to the user for the recommended action; and

receive a confirmation from the user regarding the recommended action,

wherein the recommended action is sent if the confirmation is received from the user.

18. The device of claim 16, wherein the codes, if executed, are configured to:

receive a voice request from the user;

convert the voice request to a text query; and

parse the text query to identify semantics of the voice request,

wherein the recommended action is determined based on the semantics of the voice request.

19. The device of claim 16, wherein the user profile is generated by machine learning on data collected from the user and the vehicle.

20. A non-transitory computer-readable storage medium including machine readable instructions, when executed, to implement a method of claim 8.

Description:
Smart dialogue system and method of integrating enriched semantics

from personal and contextual learning

Field

Examples relate to a smart dialogue system, more particularly a smart dialogue system improved with a context and machine learning system for enabling enhanced semantics for query understanding.

Background

Voice is a useful interaction means in situations where a user of a device needs to pay attention to something else other than the device (e.g. when driving a car). Many companies are getting into intelligent personal voice assistants, like Apple Siri, Google Assistant, Amazon Alexa, and Microsoft Cortana because voice is a natural user interaction means. However, current voice interactive systems are primitive, sticking with basic questions that follow rules, and users have to follow those rules when talking to a device.

Voice first appeared in interactive systems with a phone, but now it is extended to Intemet-of- things (IoT) like car, home, etc. The first company to exploit voice as a means for interaction without a mobile device in consumer IoT is Amazon with Amazon Echo using Alexa as a voice interface and recognition engine to search for answers to questions from Amazon’s technical database and on the internet.

The current voice interaction is still intention-based, simple, and direct, and lack of actionable information for machine to execute which is not intuitive. Without integrating with personal and contextual system, neither the machine nor a user knows of something interesting and relevant, thus voice interaction systems are missing context to provide proactive recommendations. Currently there is no method of understanding vehicle- specific requests and learning actions such as“Would you like to heat up the car before your next trip?” and how to provide responses that translate into performing a remote action in the vehicle. Summary

Examples are provided for a smart dialogue system for an intelligent personal assistant system performing specific actions (e.g. vehicle specific actions) and/or suggesting actions by enriching the human dialogue semantics based on matching of context and inferred behavior learned from the history of user’s context and data coming from the user’s activities (e.g. driving activities).

The smart dialogue system includes a spoken dialogue system and a contextual and personal system leveraging machine learning and context modeling on the data. The spoken dialogue system interacts with a user via voice. The context and machine learning system collects contextual information from a user and a vehicle and leams a user profile. The user profile includes at least one of user’s preference, interest, or habit and is generated based on the contextual information. The context and machine learning system identifies a behavior rule of the user from the user profile, and also identifies a current context associated with a current situation based on collected data that may be received from various sensors or from the user. The context and machine learning system then determines a recommended action based on the behavior rule and the current context and sends the recommended action to a target system for executing the recommended action.

The spoken dialogue system may be configured to send a confirmation request to the user for the recommended action and receive a confirmation from the user regarding the recommended action, wherein the recommended action is sent if the confirmation is received from the user. Additionally or alternatively, the spoken dialogue system may be configured to receive a voice request from the user, convert the voice request to a text query, parse the text query to identify semantics of the voice request, and send the semantics of the voice request to the context and machine learning system, and the context and machine learning system is configured to determine the recommended action based on the semantics of the voice request. The target system may be remote and the recommended action may be sent via a network. The user profile may be generated by machine learning on data collected from the user and the vehicle. The behavior rule may comprise an action and a condition for the action. The contextual information may include at least one of or a combination of phone data, vehicle data, user voice data, or user habit data. In accordance with other aspect, a method of integrating enriched semantics from personal and contextual learning in a smart dialogue system is provided. The method includes collecting contextual information from a user and a vehicle, generating a user profile comprising at least one of user’s preference, interest, or habit based on the contextual information, identifying a behavior rule of the user from the user profile, identifying a current context associated with a current situation, determining a recommended action based on the behavior rule and the current context, and sending the recommended action to a target system for executing the recommended action.

The method may further include sending a voice response to the user for the recommended action, and receiving a confirmation from the user regarding the recommended action, wherein the recommended action is sent if the confirmation is received from the user. The method may further include receiving a voice request from the user, converting the voice request to a text query, and parsing the text query to identify semantics of the voice request, wherein the recommended action is determined based on the semantics of the voice request. The target system may be remote and the recommended action may be sent via a network. The user profile may be generated by machine learning on data collected from the user and the vehicle. The behavior rule may comprise an action and a condition for the action. The contextual information may include at least one of or a combination of phone data, vehicle data, user voice data, or user habit data. The smart dialogue system may be implemented in a mobile phone.

In accordance with still another aspect, a device for integrating enriched semantics from personal and contextual learning in a smart dialogue system is provided. The device includes a memory for storing codes and a processor for executing the codes, wherein the codes, if executed, are configured to collect contextual information from a user and a vehicle, generate a user profile comprising at least one of a user’s preference, interest, or habit based on the contextual information, identify a behavior rule of the user from the user profile, identify a current context associated with a current situation, determine a recommended action based on the behavior rule and the current context, and send the recommended action to a target system for executing the recommended action.

The codes, if executed, may be configured to send a voice response to the user for the recommended action, and receive a confirmation from the user regarding the recommended action, wherein the recommended action is sent if the confirmation is received from the user. The codes, if executed, may be configured to receive a voice request from the user, convert the voice request to a text query, and parse the text query to identify semantics of the voice request, wherein the recommended action is determined based on the semantics of the voice request. The user profile may be generated by machine learning on data collected from the user and the vehicle.

In accordance with still another aspect, a non-transitory computer-readable storage medium is provided. The computer-readable storage medium may include machine readable instructions, when executed, to implement any methods disclosed herein.

Brief description of the Figures

Some examples of apparatuses and/or methods will be described in the following by way of example only, and with reference to the accompanying figures, in which

Fig. 1 shows a conventional dialogue system;

Fig. 2 shows an example structure of a smart dialogue system in accordance with one aspect;

Fig. 3 illustrates an example interaction data flow in the smart dialogue system in accordance with one aspect;

Fig. 4 shows an example flow of processing in the enhanced dialog system in accordance with one aspect; and

Fig. 5 is a block diagram of an example system for implementing the smart dialogue system in accordance with the examples disclosed herein.

Detailed Description

Various examples will now be described more fully with reference to the accompanying drawings in which some examples are illustrated. In the figures, the thicknesses of lines, layers and/or regions may be exaggerated for clarity. Accordingly, while further examples are capable of various modifications and alternative forms, some particular examples thereof are shown in the figures and will subsequently be described in detail. However, this detailed description does not limit further examples to the particular forms described. Further examples may cover all modifications, equivalents, and alternatives falling within the scope of the disclosure. Like numbers refer to like or similar elements throughout the description of the figures, which may be implemented identically or in modified form when compared to one another while providing for the same or a similar functionality.

It will be understood that when an element is referred to as being“connected” or“coupled” to another element, the elements may be directly connected or coupled or via one or more intervening elements. If two elements A and B are combined using an“or”, this is to be understood to disclose all possible combinations, i.e. only A, only B as well as A and B. An alternative wording for the same combinations is“at least one of A and B”. The same applies for combinations of more than 2 elements.

The terminology used herein for the purpose of describing particular examples is not intended to be limiting for further examples. Whenever a singular form such as“a,”“an” and“the” is used and using only a single element is neither explicitly or implicitly defined as being mandatory, further examples may also use plural elements to implement the same functionality. Likewise, when a functionality is subsequently described as being implemented using multiple elements, further examples may implement the same functionality using a single element or processing entity. It will be further understood that the terms“comprises,”“comprising,” “includes” and/or“including,” when used, specify the presence of the stated features, integers, steps, operations, processes, acts, elements and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, processes, acts, elements, components and/or any group thereof.

Unless otherwise defined, all terms (including technical and scientific terms) are used herein in their ordinary meaning of the art to which the examples belong.

A voice request is provided as an option to a user. However, often users do not initiate any request because they do not know what to say and how to say it. Examples disclosed herein provide a method and system to create a relevant, natural, understandable voice response or suggestion that a user can easily acknowledge to cause an action (e.g. in the vehicle) based on the current and/or future context. Examples also provide an adaptable system that collects context data from a user’s everyday life and processes it to automatically learn the user profile and identify behavior patterns or rules. When context is matched and a behavior pattern/rule is identified, an appropriate response is delivered as a voice suggestion in a way that is natural and not disruptive to the user and allows for semantics-rich dialogue interaction which eventually results in an action being taken.

Hereafter, examples of enhanced smart dialogue system will be explained with reference to a specific implementation related to a vehicle. It should be noted that the examples disclosed hereinafter are not limited to implementations related to a vehicle but applicable to any device or equipment other than a vehicle, such as a mobile phone, a home automation device, IoT devices, etc. Hereafter, the term“vehicle” includes any type of equipment for transportation including, but not limited to, a car, a bus, a truck, a sport utility vehicle, a recreational vehicle, a boat, a motorcycle, or the like.

Conventional systems for performing operations in a car, for example, with regards to navigation, finding points of interest, optimizing travel routes with fastest time and lowest cost, personalizing settings in a car, or the like, are still cumbersome and difficult to do and also disrupts the driver’s attention. People naturally use voice to make requests, but current systems require specific utterances and the search space is limited and the queries are simple. Current systems cannot support complex or personal contextual queries like“Can you heat up my car?” The examples disclosed herein provide relevant contextual responses to queries that are personalized and actionable (e.g. in the vehicle). A user may not know what to request. In the examples, the system is intelligent enough to ask the user after learning the user’s preferences, interests, and/or habits, thus making it proactive rather than reactive.

The examples disclosed herein may leverage user profiling and real-time context recognition obtained by machine learning (machine intelligence) to enrich and improve the semantics understanding in human spoken dialogue (human interaction). Machine learning may leam a user’s habit, interest, preference, or the like from user’s data, and then form a user profile.

The examples provide a high-performance platform to enable a rich semantic dialogue system leveraging scalable and real-time processing of rich context information and machine learning over user’s big data. The examples may be applied to an intelligent personal assistant. In the examples, context recognition may be used to identify the situation from the real-time data stream.

Fig. 1 shows a conventional dialogue system that is simple and not intelligent. In the conventional system, a voice request 112 from a user 110 is captured by a spoken dialogue system 120 and the spoken dialogue system 120 makes a voice response 114 to the user 110. The conventional system is just based on voice commands that trigger external services without any personal learning of the user’s behavior or understanding of the semantics of the user’s queries.

Fig. 2 shows an example structure of a smart dialogue system 200 in accordance with one aspect. The smart dialogue system 200 includes a spoken dialogue system 210 and a context and machine learning system 220. The spoken dialogue system 210 is enhanced with the context and machine learning system 220. In order to improve the conventional dialogue systems, the spoken dialogue system 210 is improved with a user profile and context processed by the context and machine learning system 220.

The spoken dialogue system 210 includes a channel to client 212 to interact with a user. The spoken dialogue system 210 may include a speaker and a microphone (not shown) to interact with a user via voice. Dialogue is one of the most preferred human interaction with a device (e.g. when driving a vehicle). Voice interaction is based on the semantic processing on the text data with voice user interface (UI). Alternatively, or additionally, the channel to client 212 may include an interface to receive/transmit a query/response from/to the user or network, such as speech, email, text message (e.g. short message services (SMS) or multimedia message service (MMS)), image, video, text QnA, chatting, web browser, calendar, messenger, etc.

A user makes a voice request to the spoken dialogue system 210. The voice request is processed by the cognitive service 214. The cognitive service 214 includes automatic speech recognition (ASR) for converting the voice request to a text query. The cognitive service 214 includes a text-to-speech (TTS) to convert a text response generated in response to the voice request or autonomously to a voice response. The cognitive service 214 may use third party services 216 (e.g. Microsoft Cortana, Nuance, etc.) to perform the ASR and/or the TTS. The cognitive service 214 may include a language understanding module that can parse the text query to identify semantics of the voice request, such as intent, entities or the like. The spoken dialog system 210 may also include a dialog manager 218. The dialog manager 218 may handle the dialog flow, e.g. dialog state management, and decide the next step of processing. The dialog manager 218 is a central hub in the spoken dialog system capable of integrating all other enabling such as speech recognition, speech synthesis, natural language semantics parser, sentimental analysis, navigation, map, traffic, weather, and other service APIs.

The context and machine learning system 220 may handle personalized and semantic-enhanced contextual voice queries. The context and machine learning system 220 may receive the text query from the spoken dialogue system 210 via an interface 222 (e.g. an application programing interface or a push service) and identify the dialogue components (e.g. intent, entities, context, action, etc.).

The context and machine learning system 220 may include a machine learning offline processing module 224. The machine learning offline processing module 224 collects data from the user and/or a vehicle and stores historical user and vehicle data in the data lake 226 (i.e. big data storage). The machine learning offline processing module 224 takes the historical user and vehicle data from the data lake 226 and uses machine learning to build a user profile of user preferences, interests, habits, etc.

The context and machine learning system 220 may also include a context streaming module 228 to recognize the context of the current situation, such as the environment, user’s current situation, or the like. The context streaming module 228 may receive data from various sensors in the vehicle, user’s device(s), cloud, or the like, or directly from the user, (e.g. current temperature, remaining time to the next appointment, remaining fuel amount, driving speed, Global Positioning System (GPS) locations, vehicle setting, or any other relevant data), and may recognize the context of the current situation in real time.

The semantics -related context-aware services 230 may take the current context data along with the user profile, identify the behavior rule/pattern of the user, and generate a recommended action which is computed as a personalized contextual response or autonomous action for the user. The spoken dialogue system 210 is enriched by the context and machine learning system 220. After determining that the semantics of the user voice request cannot be performed by the spoken dialog system 210, it is processed into its individual components (e.g. intent, entities, context, action) by the context and machine learning system 220, and then with the identification of the user profile in the machine learning offline processing module 224 and the current context obtained from the context streaming module 228, a voice response or autonomous action is created, which is spoken back to the user. As ASR and TTS are used to convert between speech and text, the processing may focus on the data in a text format throughout the report.

Fig. 3 illustrates an example interaction data flow in the smart dialogue system in accordance with one aspect. An incoming text query 302 is processed by the text analysis and semantic parsing module 304. The incoming text query 302 may be obtained by converting the user’s voice request by the spoken dialogue system or may be received as a text query. The text may be from onboard or offboard applications in the vehicle app, a web robot (e.g. Internet Bot), a chatting application, an email, a question and answer (QnA) service, audio or video messaging or communication services such as skype, WeChat, etc. The text analysis and semantic parsing module 304 may be included in the context and machine learning system 220 or in the spoken dialogue system 210. The text analysis and semantic parsing module 304 analyzes and parses the text query to identify the semantics of the user’s request such as intent, entities, context, action, or the like.

A context processing and storage module 306 in the context and machine learning system 220 collects contextual information from a user, a vehicle, etc., and stores the collected data as historical user and vehicle data. The contextual information may be collected from vehicle sensors, software, or service, and may be provided by the user. A user profiling module 308 in the context and machine learning system 220 may use machine learning on the historic user and vehicle data collected and stored by the context processing and storage module 306 in order to leam user preferences, interests, habits, associated actions, and the like.

Enriched semantics may be used to form an intelligent response to the user’s request. Based on the intent, entities, context, action, or the like, a semantics-related context-aware service module 310 in the context and machine learning system 220 may find an appropriate user preference, habit, interest, and action to create a system response in the form of a notification or suggestion. The system may not perform an action until the current context is met with a user’ s behavior context.

Once a recommended action 312 is determined by the semantics-related context-aware service module 310, the text analysis and semantic parsing module 304 takes the recommended action 312 and creates an outgoing text 314 that will be spoken back to the user. Alternatively, the recommended action may be sent to the user as a text message.

Example voice interactions for a driver-initiated case and a system-initiated case are explained below.

In case of a driver- initiated case, the system translates the user’s voice request into a workflow with the identified intent, entities, context, and actions, etc. A user profile is generated based on the collected user and vehicle data, and user behavior pattems/rules are learned from the user profile. Entities and context from the user voice request are searched within the user profile and actions are determined from the user behavior patterns. When a current context matches the context extracted from the voice request, a recommended action is translated as a voice response or suggestion back to the user.

An example dialogue between a driver and a car is shown below.

From the dialogue“heat up my car before my next trip,” the following semantics may be obtained:

Intent: heating the car;

Entity: the car;

Context: the next trip.

From the context and machine learning system, the above semantics may be enriched as follows:

Heating => learned preferred setting and temperature; Next trip => predicted destination (e.g. arrival time and location).

From the dialogue“remind me to buy milk on the way home,” the following semantics may be obtained:

Intent: reminder;

Entity (item name): milk;

Context: on the way home.

From the context and machine learning system, the above semantics may be enriched as follows:

Reminder => push notification;

Milk => preferred shop for milk;

On the way home => learned personal route.

In case of a system-initiated case, there is no voice input from the user, but just matching of the current context of the situation to one of the user’s behavior patterns/rules. In this case, the system outputs the voice recommendation or suggestion first.

An example dialogue between a user and a car is shown below.

In this example, the system knows the current context that the temperature is -5 °C and the user’s next trip is going home. From the previous behavior(s), when the temperature is cold, the driver always warms up the car. Therefore, the system predicts that the driver might want to pre-heat the car before going back home, which is spoken to the driver as a voice output.

Fig. 4 shows an example flow of processing in the enhanced dialog system in accordance with one aspect. In Fig. 4, the spoken dialogue system may be the conventional component and a context and machine learning system is provided for the enhanced dialogue system with enriched semantics. A user may interact with the system via the spoken dialogue system 210. The case that a user initiates a voice request to perform an action or request for information, or the like is explained hereafter. In the driver-initiated case, a user makes a voice request 402 to ask a question or perform an action (e.g.“Heat my car up before my next trip”), or the like (402). This may be omitted where the system initiates interaction. The voice request 402 is received by the spoken dialogue system 210 and converted to a text query as explained above. The text query is sent to the context and machine learning system 220 (404).

The context and machine learning system 220 checks to see if there is a rule that may be applied to the voice request (406). If it is determined that there is no rule applicable to the voice request, the spoken dialogue system may execute the request. If there is an applicable rule, the text query may be processed by the text analysis and semantic processing module to get intent, entities, context, and action, etc. (408). In the above example, the intent is“heat up my car,” the entity is“my car,” the context is“before my next trip,” and the action is“heat up.”

The content processing and storage module collects user and vehicle data, such as phone data, car data, voice data, personal habit data, or the like and records them as historic user and vehicle data (410). For example, the content processing and storage module may record that a driver turns on the heater on a specific outside temperature condition with the specific setting of the heater, refills the gas tank at a specific level, takes a certain route to go back home, or any other relevant data specific to the driver’s behavior.

The user profile learning module uses machine learning on the historic user and vehicle data to form a user profile (412). Learned habits are then identified based on the user profile (414). Learned habits may be defined as an IF-THEN rule pattern using a pattern template with a set of conditions. Appropriate data for each condition is collected and thresholds are learned from machine learning. The action may be defined as a set of key-value pairs where the keys are defined and the values are learned from what the user actually does. For example,“heat habit” may be identified as follows: IF [a difference between internal and external temperature in a vehicle > n, external temp < m, heat_control > p\, then the“heat” action = [desired_temp = 24, defrost = true, driver_seat_warm = true, heat_control = 5, ventilation = on], where t m, and p are threshold values that are learned from machine learning. Given the action, the learned habits are matched to the action with an appropriate personal habit (habit profile) from the user profile to create a behavior rule. For example, if the action =“heat”, then it may be set as [desired_temp = 24, defrost=true, driver_seat_warm = true, heat_control = 5, ventilation = on], which may be referred to as“heat profile.”

Based on the semantics of the user’s request (e.g. intent, entities) and the identified user’s behavior rule, the action recommender may determine a recommended action (416).

The context and machine learning system 220 receives data from various sensors, devices, the user, or the like, and determines the current context of the situation, e.g. the environment, the current situation of the user, or the like. The context and machine learning system 220 may determine whether the behavior context (the context of the identified user behavior pattern) and the current context (the current situation) match (418). The action may be implemented if the behavior context and the current context are matched. For example, if the current context = [extemal_temp = -5, internal_temp = 15, TTL_before_next_trip = 15 min], and if this current context matches the user behavior context, the heat profile may be set.

The recommended action is translated using a language and dialogue generation module in order to output a voice response back to the user (e.g.“Set heat profile”) (420). In order to form the voice response, a voice response template may be filled in. An example voice response template is shown below. In the voice response template example below, the <> may be substituted with the actual values.

The spoken dialogue system 210 may then make a voice response to the user (432) to provide an answer to the voice request 402 or for suggesting an action to perform. The user may respond with an action confirmation (434) to the spoken dialogue system 210 for confirming the voice response 432 to execute the action. The action confirmation may be a simple word like“Yes” or“No.” An example voice response and action confirmation are shown below.

If the action confirmation is“Yes”, then the action may be sent to a vehicle remote system (422). The vehicle remote system 423 takes the remote command 422 from the user, e.g. phone app, and sends the instruction to the vehicle 440 to execute, e.g. lock the door, heat the seat, etc. The vehicle remote system 423 may be included in the smart dialogue system, or alternatively in the vehicle 440. The vehicle remote system 423 may convert the recommended action to a remote request (424). For example, if the action =“heat”, then the heat profile settings, e.g. [desired_temp = 24, defrost = true, driver_seat_warm = true, heat_control = 5, ventilation = on], may be taken and packaged into a remote request. The remote request may be then sent to the vehicle 440 (e.g. via a network) to execute the action (426). The vehicle_id may be taken from the system and the remote request may be sent to the vehicle 440 using the vehicle_id.

A system-initiated case is explained hereafter. The context and machine learning system may continuously or periodically leam the user profile and learned habits, and the behavior rules of the user are identified as explained above (412, 414). When the behavior context (of the learned user behavior pattern) matches the current context of the situation, an action may be recommended (416). For example, the context and machine learning system 220 may determine that the next trip to home is coming up in 15 minutes and the temperature is -5 °C. Based on the learned behavior pattern of the driver, the system knows that the driver heats up the car when the temperature is less than 0 °C. Therefore, the system may recommend to pre-heat the car. The system then makes a (unsolicited) voice response or recommendation to the user (432). The user may confirm the action delivered by the voice response/recommendation (434). The system may respond to the action confirmation as well. Example voice response and action confirmation for the system-initiated case are below.

If the action is confirmed, the action is converted into a remote request which is executed in the vehicle (422, 424, 426) as explained above. For example, the car is pre -heated to the following settings as specified in the learned habit (e.g. [desired_temp = 24, defrost = true, driver_seat_warm = true, heat_control = 5, ventilation = on]).

The examples provide a system to combine human- oriented dialogue system and data-oriented context and machine learning system for enabling enhanced semantics for query understanding and improved smart user experience. The examples provide simplified, unambiguous, and natural conversational interface between a user and a vehicle. With the examples, personalized actions can be automated in the vehicle based on the learned behavioral habits and context. In the examples, instead of having to poll for initiation of conversation from a user’s voice query, conversation may be initiated by the system by making a proactive personalized recommendation to the user. The example systems support personal intelligence.

Fig. 5 is a block diagram of an example system (or device) 500 configured to integrate enriched semantics from personal and contextual learning in a smart dialogue system in accordance with the examples disclosed herein. The device 500 may be a mobile device. The device 500 includes a processor 510, a memory/storage 520, a wireless communication module 530 including a baseband module and a radio front end module (not shown), a location module 540, a display 550, a display driver 560, sensors 570, a speaker 570, a microphone 580, etc. In some aspects, the processor 510 may include, for example, one or more central processing unit (CPU) cores and one or more of cache memory. The wireless communication module 530 may support wireless communication of the device 500 in accordance with any wireless communication protocols, such as Third Generation (3G), Fourth Generation (4G), Fifth Generation (5G), WiFi, Bluetooth, or any wireless communication standards. Additionally, the device may also include a wired communication module. The memory/storage 520 may store codes or data such as user profile data, etc. The sensors 570 are included for sensing various activities of the user. For example, the sensors 570 may include an accelerometer, a gyroscope, etc. The location module 540 may detect the location, such Global Positioning System (GPS) location, of the device 500.

The memory/storage 520 (i.e. a machine-readable storage medium) stores codes to be executed by the processor 510. The codes, if executed, are configured to collect contextual information from a user and a vehicle, generate a user profile comprising at least one of user’s preference, interest, or habit based on the contextual information, identify a behavior rule of the user from the user profile, identify a current context associated with a current situation, determine a recommended action based on the behavior rule and the current context, and send the recommended action to a target system for executing the recommended action.

The codes, if executed, may also be configured to send a voice response to the user for the recommended action and receive a confirmation from the user regarding the recommended action, wherein the recommended action is sent if the confirmation is received from the user. The codes, if executed, may also be configured to receive a voice request from the user, convert the voice request to a text query, and parse the text query to identify semantics of the voice request, wherein the recommended action is determined based on the semantics of the voice request.

Another example is a computer program having a program code for performing at least one of the methods described herein, when the computer program is executed on a computer, a processor, or a programmable hardware component. Another example is a machine -readable storage including machine readable instructions, when executed, to implement a method or realize an apparatus as described herein. A further example is a machine-readable medium including code, when executed, to cause a machine to perform any of the methods described herein.

The aspects and features mentioned and described together with one or more of the previously detailed examples and figures, may as well be combined with one or more of the other examples in order to replace a like feature of the other example or in order to additionally introduce the feature to the other example.

Examples may further be or relate to a computer program having a program code for performing one or more of the above methods, when the computer program is executed on a computer or processor. Steps, operations or processes of various above-described methods may be performed by programmed computers or processors. Examples may also cover program storage devices such as digital data storage media, which are machine, processor or computer readable and encode machine-executable, processor-executable or computer-executable programs of instructions. The instructions perform or cause performing some or all of the acts of the above- described methods. The program storage devices may comprise or be, for instance, digital memories, magnetic storage media such as magnetic disks and magnetic tapes, hard drives, or optically readable digital data storage media. Further examples may also cover computers, processors or control units programmed to perform the acts of the above-described methods or (field) programmable logic arrays ((F)PLAs) or (field) programmable gate arrays ((F)PGAs), programmed to perform the acts of the above-described methods.

The description and drawings merely illustrate the principles of the disclosure. Furthermore, all examples recited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor(s) to furthering the art. All statements herein reciting principles, aspects, and examples of the disclosure, as well as specific examples thereof, are intended to encompass equivalents thereof.

A functional block denoted as“means for ...” performing a certain function may refer to a circuit that is configured to perform a certain function. Hence, a“means for s.th.” may be implemented as a“means configured to or suited for s.th.”, such as a device or a circuit configured to or suited for the respective task.

Functions of various elements shown in the figures, including any functional blocks labeled as “means”,“means for providing a sensor signal”,“means for generating a transmit signal.”, etc., may be implemented in the form of dedicated hardware, such as“a signal provider”,“a signal processing unit”,“a processor”,“a controller”, etc. as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which or all of which may be shared. However, the term “processor” or“controller” is by far not limited to hardware exclusively capable of executing software but may include digital signal processor (DSP) hardware, network processor, application specific integrated circuit (ASIC), field programmable gate array (FPGA), read only memory (ROM) for storing software, random access memory (RAM), and non-volatile storage. Other hardware, conventional and/or custom, may also be included.

A block diagram may, for instance, illustrate a high-level circuit diagram implementing the principles of the disclosure. Similarly, a flow chart, a flow diagram, a state transition diagram, a pseudo code, and the like may represent various processes, operations or steps, which may, for instance, be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown. Methods disclosed in the specification or in the claims may be implemented by a device having means for performing each of the respective acts of these methods.

It is to be understood that the disclosure of multiple acts, processes, operations, steps or functions disclosed in the specification or claims may not be construed as to be within the specific order, unless explicitly or implicitly stated otherwise, for instance for technical reasons. Therefore, the disclosure of multiple acts or functions will not limit these to a particular order unless such acts or functions are not interchangeable for technical reasons. Furthermore, in some examples a single act, function, process, operation or step may include or may be broken into multiple sub-acts, -functions, -processes, -operations or -steps, respectively. Such sub acts may be included and part of the disclosure of this single act unless explicitly excluded.

Furthermore, the following claims are hereby incorporated into the detailed description, where each claim may stand on its own as a separate example. While each claim may stand on its own as a separate example, it is to be noted that - although a dependent claim may refer in the claims to a specific combination with one or more other claims - other examples may also include a combination of the dependent claim with the subject matter of each other dependent or independent claim. Such combinations are explicitly proposed herein unless it is stated that a specific combination is not intended. Furthermore, it is intended to include also features of a claim to any other independent claim even if this claim is not directly made dependent to the independent claim.