Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DETECTING UNRELATED UTTERANCES IN A CHATBOT SYSTEM
Document Type and Number:
WIPO Patent Application WO/2021/050891
Kind Code:
A1
Abstract:
Techniques are described to determine whether an input utterance is unrelated to a set of skill bots associated with a master bot. In some embodiments, a system described herein includes a training system and a master bot. The training system trains a classifier of the master bot. The training includes accessing training utterances associated with the skill bots and generating training feature vectors from the training utterances. The training further includes generating multiple set representations of the training feature vectors, where each set representation corresponds to a subset of the training feature vectors, and configuring the classifier with the set representations. The master bot accesses an input utterance and generates an input feature vector. The master bot uses the classifier to compare the input feature vector to the multiple set representations so as to determine whether the input feature falls outside and, thus, cannot be handled by the skill bots.

Inventors:
PAN CRYSTAL C (US)
SINGARAJU GAUTAM (US)
VISHNOI VISHAL (US)
GADDE SRINIVASA PHANI KUMAR (US)
Application Number:
PCT/US2020/050429
Publication Date:
March 18, 2021
Filing Date:
September 11, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ORACLE INT CORP (US)
International Classes:
G06F40/216; G06F40/284; G06F40/30; G06F40/35
Foreign References:
US20190108836A12019-04-11
Other References:
KIM, JOO-KYUNG ET AL: "Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates Joo-Kyung Kim and Young-Bum Kim", ARXIV, 29 June 2018 (2018-06-29), XP055760560, Retrieved from the Internet [retrieved on 20201216]
"Communications in computer and information science", vol. 709, 2017, SPRINGER, DE, ISSN: 1865-0929, article HARSHA S. GOWDA ET AL: "Semi-supervised Text Categorization Using Recursive K-means Clustering : First International Conference, RTIP2R 2016, Bidar, India, December 16-17, 2016, Revised Selected Papers", pages: 217 - 227, XP055761399, DOI: 10.1007/978-981-10-4859-3_20
LI QI ET AL: "Distributed open-domain conversational understanding framework with domain independent extractors", 2014 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), IEEE, 7 December 2014 (2014-12-07), pages 566 - 571, XP032756913, DOI: 10.1109/SLT.2014.7078636
MIKIO NAKANO ET AL: "A Two-Stage Domain Selection Framework for Extensible Multi-Domain Spoken Dialogue Systems", PROCEEDINGS OF THE SIGDIAL 2011: THE 12TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, 2011, pages 18 - 29, XP055760014, Retrieved from the Internet [retrieved on 20201215]
Attorney, Agent or Firm:
WRIGHT, Alicia L. et al. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A system comprising: a training system configured to tram a classifier model, wherein training the classifier model comprises: accessing training utterances associated with skill bote, the training utterances comprising respective training utterances associated with each skill hot of the skill bote, wherein each skill hot of the skill bote is configured to provide a dialog with a user; generating training feature vectors from the training utterances, the training feature vectors comprising respective training feature vectors associated with each skill hot of the skill bote; generating multiple set representations of the training feature vectors, wherein each set representation of tire multiple set representations corresponds to a subset of the tr aining feature vectors; and configuring the classifier model to compare input feature vectors to the multiple set representations of the training featur e vectors; and a master hot configured to perform operations comprising: accessing an input utterance as a user input; generating an input feature vector from tire input utterance; comparing, using the classifier model, the input feature vector to the multiple set representations of the (mining feature vectors; outputting an indication that the user input cannot he handled by the skill hots, based on the input feature vector falling outside the multiple set representations.

2. flic system of claim 1 , wherein : generating the multiple set representations of the training feature vectors comprises generating clusters to which the (mining feature vectors are assigned: and comparing the input feature vector to the multiple set representations comprises determining that the input feature vectors does no fall inside boundaries of the dusters.

3. The system of claim 2, wherein generating the clusters comprises:

SO determining respective centroid locations for initial clusters in a feature space; assigning each tearing feature vector†© an initial cluster in fee initial clusters, having a respecti ve centroid location to which fee training feature vector is closest: determining boundaries of the initial clusters, each initial cluster of fee initial clusters including correspcsiding assigned training feature vectors; and responsive to determining that a stopping condition lias not ye† been satisfied, updating the initial clusters, wherein updating the initial clusters comprises: determining an increased count of clusters such tiiai the increased count of fee clusters is gr eater than an init ial count of the mitial cluster s; determining respective centroid locations for fee cluster s in fee feature space; assigning each training feature vector to a cluster, in the clusters, having a respective centroid location to which the training feature vector is closest; and determining the boundaries of the clusters each cluster of the clusters including corresponding assigned tenting feature vectors.

4. Tire system of claim 2, wherein the operations of the master hot tether comprise: accessing a second input utterance as a second user input; generating a second input feature vector front the second input utterance; determining that the second input feature vector falls inside a boundary of a cluster of fee clusters; and forwarding the second input utterance to a skill hot associated with the cluster for processing, based on the second input feature vector failing inside fee boundary of the cluster.

5. Tire system of claim 4, further comprising the skill hot, wherein fee skill hot is configured to process the input utterance to perforin an action responsive to fee user input.

6. Tire system of claim 1, wherein generating fee multiple set representations of the training feature vectors comprises: dividing the training utterances into conversation categories; and generating composite feature vectors corresponding to fee conversation categories, wherein generating the composite feature vectors comprises, for each conversation category of the conversation categories, generating a respective composite feature vector as an aggregate of respective tr aining feature vectors of the training utterances in the conversation category.

7. The system of claim 6, wherein, for each conversation category of the conversation categories, generating the composite feature sector as an aggregate of the respective training feature vectors of the training utterances in the conversation category comprises averaging the respective training feature vectors csf the training utterances in the conversation category.

8. The system of claim 6, wherein the conversation categories are defined based on intents tor which the skill hots are configured, such that each conversation category corresponds to a respective skill hot intent and comprises training utterances representative of the respective skill hot intent.

9. The system of claim 6, wherein each conversation category of the conversation categories corresponds to a respective skill bot of the skill hots and comprises training utterances representative of the respective skill bot.

10. The system of claim 6, wherein comparing the input feature vector to the multiple set representations of the training feature vectors comprises determining that the input feature vector is not sufficiently similar to the any of the composite feature vectors.

11. The system of any of claims 1 to 10, wherein the operations of the master hot further comprise: accessing a second input utterance as a second user input; generating a second input feature vector from the second input utterance; determining that the second input feature vector is sufficiently similar to a composite feature vector of the composite feature vectors; and forwarding the second input «iterance to a skill bot associated with the composite feature vector for processing, based on the second input feature vector being sufficiently similar to the composite feature vector.

12. The system of claim 11, further comprising the skill bot, wherein the skill bot is configured to process the input utterance to perform an action responsive to the user input.

13. A method comprising: accessing, by a computer system, training utterances associated with skill bots, the training utterances comprising a respective subset of training utterances for each skill bot of the skill hots, wherein each skill hot of fee skill bots is configured to provide a dialog wife a user; generating training feature vectors ifom fee training «iterances, fee training feature vectors: comprising a respective training feature vecto for each training utterance of fee training utterances; determining centroid locations for clusters in a feature space; assigning each training feature vector, of the tininiBg feature vectors, to a respective cluster, of fee clusters, having a respective centroid location to which fee training feature vector is closest from among fee clusters: repeatedly modifying fee clusters until a stopping condition is met, wherein modifying fee clusters comprises: increasing a count of the clusters to an updated count; determining new centroid locations for the clusters in a quantify equal to the updated count; and reassigning the training feature vectors to fee clusters based on closeness to the new centroid locations; detemiining boundaries of the clusters, fee boundaries comprising a respective boundary for each cluster of the clusters: accessing an input utterance; converting fee input utterance to an input feature vector; detemiining feat the input feature vector fells outside fee boundaries of fee clusters by comparing fee input feature vector to fee boundaries of the clusters; and outputting an indication feat the input utterance cannot be handled by the skill bots, based on fee input feature vector felling outside the clusters in the feature space.

14. The method of claim 13, further comprising accessing a second input utterance as a second user input; generating a second input feature vector from the second input uterance; detemiining that the second input feature vector falls inside one or more clusters of the clusters; and forwarding fee second input utterance to a skill hot associated wife the one or more clusters for processing, based cm fee second input feature vector felling inside fee one or more clusters.

15. The method of claim 14, wherein the one or more clusters include respective training utterances of the skill hot and respective training utterances of a second skill bot and the method further comprising; selecting the skill hot for processing the input utterance, fio between the skill hotand lir second skill hot based on respective confidence score computed for the skill hot and the second skill hot.

16. The method of claim 14, wherein the one or more clusters include respective training utterances of the skill bot and respective training uterances of a second skill hot, and the metho further comprising: selecting the skill bot, for processing the input uterance, from between the skill hot and the second skill hot based on application of a k-iiearest neighbors technique to the respective training utterances of the skill bot and the respective training utterances of the second skill hot.

17. A method comprising: accessing, by a computer system, training utterances associated with skill bots, the training utterances comprising a respective subset of training utterances for each skill hot of the skill bots, wherein each skill bot of the skill bots is configured to provide a dialog with a user; generating training feature vectors from the training uterances, the training feature vectors comprising a respective training feature vector for each training uterance of the training utterances; dividing the training utterances into conver sation categories; and generating composite feature vectors corresponding to the conversation categories, wherein generating the composite feature vectors comprises, for each conversation category of the conversation categories, generating a respective composite feature vector as an aggr egate of respective training feature vectors of the training utterances in the conversation category'; accessing an input utterance; converting the input uterance to an input feature vector; determining that the input feature vector is not sufficiently similar to the composite feature vectors by compar ing the input feature vector to tire composite feature vectors; and outputting an indication that the input utterance cannot be handled by the skill hots, based on the input feature vector not being sufficiently similar to the composite feature vectors.

18. The method of claim 17, further comprising: accessing a second input utterance; converting the second input utterance to a second input feature vector; determining that the second input feature vector is sufficiently similar to a composite feature vector of the composite featur vectors by comparing the second input feature vector to the composite feature vectors; and forwarding the second input utterance to a skill bot associated with the composite feature vector for processing, based on the second input feature vector being sufficiently similar to the composite feature vector.

19. The method of claim 18, wherein; the conversation categories are defined based on intents that the skill bots are configured to handle, such that each conversation category corresponds to a respective one or more skill bot intents and comprises training utterances representative of the respective one or more skill bot intents; and the composite feature vector corresponds to a skill bot intent that the skill hot is configured to handle.

20. The method of claim 19, wherein determining feat the second input feature vector is sufficiently similar to the composite feature vector by comparing the second input feature vector to the composite feature vectors comprises; determining that hie second input feature vector is sufficiently similar to one or more additional composite feature vectors corresponding to mie or more additional skill bot intents of the skill bots; performing a k-nearest neighbors analysis comprising; identifying nearby composite feature vectors in a predefined quantify, wherein the nearby composite feature vectors are closest to the input feature vector; determining that a majority of the nearby composite feature vectors correspond to skill bot intents that the skill bot is configured to handle; and selecting the skill bot based on the majority of the nearby composite feature vectors corresponding to skill bot intents that the skill hot i configured to handle.

21. A computer configured to execute the method recited in any of claim 13 to 20

22. A method comprising; accessing a input utterance as a user input; generating an input feature vector from the input utterance; comparing, using a classifier model, the input feature vector to multiple set representations of the training feature vectors; outputting an indication that the user input cannot be handled by skill hots, based on the input feature vector falling outside the multiple set representations wherein each skill bot of the skill bots is configured to provide a dialog with a user.

23. The method of claim 22, further comprising training the classifier model, wherein training the classifier model comprises; accessing training utterances associated with the skill bots, the training utterances comprising respective training utterances associated with each skill hot of the skill bots; generating training feature vectors from the training utterances, the training feature vectors comprising respective training feature vectors associated with each skill bot of the skill bots; generating the multiple set representations of the trainin feature vectors, wherein each set representation of the multiple set representations corresponds to a subset of the training feature vectors; and configuring the classifier model to compare input feature vectors to the multiple set representations of the training featur e vectors.

24. The method of chu 23, wherein: generating the multiple set representations of the training feature vectors comprises generating clusters to which the training feature vectors are assigned; and comparing the input feature vector to the multiple set representations comprises determining that the input feature vectors does not fall inside boundaries of the clusters.

25. The method of claim 24, wherein generating the clusters comprises: determining respective centroid locations for initial clusters in a feature space; assigning each training feature vector to an initial cluster, in fee initial clusters, having a respecti ve centroid location to which fee training feature vector is closest: determining boundaries of the initial clusters, each initial cluster of fee initial clusters including correspcsiding assigned training feature vectors; and responsive to determining that a stopping condition lias not yet been satisfied, updating the initial clusters, wherein updating the initial clusters comprises: determining an increased count of dusters such tliat the increased count of fee clusters is gr eater than an init ial count of the initial clusters, determining respective centroid locations for fee clusters in the feature space; assigning each training feature vector to a cluster, in the clusters, having a respective centroid location to which the training feature vector is closest; and determining the boundaries of the clusters each cluster of the clusters including corresponding assigned training feature vectors.

26. The method of claim 25, further comprising: accessing a second input utterance as a second user input; generating a second inpu feature vector from fee second input utterance; determining that the second input feature vector falls inside a boundary of a cluster of fee clusters; and forwarding the second input utterance to a skill hot associated with fee cluster for processing, based on the second input feature vector falling inside fee boundary of the cluster.

27. Tire method of claim 26, further comprising configuring the skill hot to process fee input utterance to perforin an action responsive to the user input.

28. Tire method of claim 23, wherein generating the multiple set representations of fee training feature vectors comprises: dividing the training utterances into conversation categories; and generating composite feature vectors corresponding to fee conversation categories, wherein generating the composite feature vectors comprises, for each conversation category of the conversation categories, generating a respective composite feature vector as an aggregate of respective tr aining feature vectors of fee training utterances in the conversation category.

29. The method of claim 28, wherein, for each conversation category of the conversation categories, generating the composite feature sector as an aggregate of the respective framing feature vectors of the training utterances in the conversation category comprises averaging the respective training feature vectors of the training utterances in fee conversation category.

30. The metho of claim 28. wherein th conversation categories are defined based on intents for winch the skill hots are configured, such that each conversation category corresponds to a respective skill hot intent and comprises training utterances representative of the respective skill hot intent.

31. The method of claim 28, wherein each conversation category of the conversation categories corresponds to a respective skill hot of the skill bots and comprises training utterances representative of the respective skill hot.

32. The method of claim 28, wherein comparing fee input feature vector to the multiple set representations of the training feature vectors comprises determining that the input feature vector is not sufficiently similar to the any of the composite feature vectors.

33. The method of claim 28, wherein the operations of the master hot further comprise: accessing a second input utterance as a second user input; generating a second input feature vector from the second input utterance; determining that the second input feature vector is sufficiently similar to a composite feature vector of the composite feature vectors; and forwarding the second input utterance to a skill hot associated with the composite feature vector for processing, based on the second input feature vector being sufficiently similar to the composite feature vector.

34. The method of claim 33, further comprising configuring the skill hot to process the input utterance to perform an action responsive to the user input.

35. A dialog system comprising means for performing a method of any of claims 22-34.

Description:
DETECTING UNRELATED UTTERANCES IN A CHATBOT SYSTEM

CROSS-REFERENCE TO RELATED APPLICATIONS

[fMMllJ The present application claims the benefit and priority under 35 U.S.C. 119(e) of U.S. Provisional Application No. 62/899,700, filed on September 12, 2019, titled “Detecting Unrelated Utterances in a Chatbot System,” the contents of which are herein incorporated by reference for all purposes.

BACKGROUND

[0092] Chatbots are artifieial-intelligence-base software application or devices that provide an interface for conversations with human users. Chatbots can be programmed to perform various tasks in response to riser input provided dining a conversation. Th user input can be supplied in various forms including, for example, audio input and text input. Urns, natural language understanding, speech-fo-text, and other linguistic processing techniques may he employed as part of the processing performed by a chatbot. In some computing environments, multiple chatbots are available to converse with a user, with each chatbot handling a different set of tasks.

SUMMARY

[0003] Techniques described herein are for determining that an input utterance from a user is not relate to any skill hot, also referred to as a chatbot, in a set of one or more skill bots that are available to a master hot. In some embodiments, the master hot can evaluate the input utterance and either determine that the input utterance is unrelated to the skill bots or route the input utterance in an appropriate skill hot.

[0004] In some embodiments, a system described herein includes a training system and a master hot. Tire training system is configured to train a classifier model. Training the classifier model Includes accessing training utterances associated with skill bots, where the training utterances comprising respective training utterances associated with each skill hot of the skill bots. Each skill hot is configured to provide a dialog with a user. The training further includes generating training feature vectors from the training utterances. The training feature vectors include respective training feature vectors associated with each skill hot. The training further includes generating multiple set representations of the training feature vectors, where each set representation of the multiple set representations corresponds to a subset of the training feature vectors, and configuring the classifier inode! to compare input feature vectors to the multiple set representations. The master bot is configure fo access an input utterance as a user input and to generate an input feature vector from the input utterance. The master hot is further configured to use th classifier model to compare the input feature vector t the multiple set representations of tlie training feature vectors and to output an iadieaticai that the user input cannot he handled by the slcill hots based on the input feature vector falling onisi e the multiple set representations

10005] In additional or alternative embodiments, a method described herein includes accessing, by a computer system, training utterances associated with skill bots, where the training utterances include a respective subset of training utter ances for each skill bot of the skill bots. Each skill bot is configured to provide a dialog with a user. The method further includes generating training feature vectors from tire training utterances, where the training feature vectors include a respective training feature vector for each training utterance. The method further includes determining centroid locations for clusters hi a feature space and assigning each training feature vector to a respective cluster having a respective centroid location to which the training feature vector is closest from among the clusters. The method finflier includes repeatedly modifying the clusters until a stopping condition is met Modifying the clusters includes increasing a count of the clusters to an updated count, determining new centroid locations for the clusters in a quantity equal to the updated count, and reassigning the training feature vectors to the clusters based on closeness to the ne centroid locations. Tire method further includes determining boundaries of the clusters, where the boundaries include a respective boundary for each cluster of the clusters. Hie method further includes accessing an input utterance, converting the input utterance to an input feature vector, and determining that the input feature vector falls outside the boundaries of the clusters by comparing the input feature vector to the boundaries of the clusters. Additionally, the method includes outputting an indication that the input utterance cannot be handled by the skill hots, based on the input feature vector falling outside the clusters in the feature space.

|0006] In still additional or alternative embodiments, a method described herein includes accessing, by a computer system, training utterances associated with skill bots. The training utterances include a respective subset of training utterances for each skill bot of the skill bots. Each stall bot is configured to provide a dialog with a user. The method further includes generating training feature vectors from the training utterances, where the training feature vectors include a respective framing feature vector for each training utterance of the training utterances, and dividing the framing utterances into conversation categories. The method further includes generating composite feature vector corresponding to the conversation categories. Generating the composite feature vectors includes, for each conversation category of the conversation categories, generating a respective composite feature vector as an aggregate of respective training featme vectors of the training utterances in the conversation category. The method further includes accessing an input utterance, converting the input utterance to an input feature vector and determining that tire input feature vector is not sufficiently similar to the composite feature vectors by comparing the input feature vector to the composite feat e vectors. Additionally, the method includes outputting an indication that the input utterance cannot be handled by the skill bote, based on the input featme vector not being sufficiently similar to the composite featme vectors.

|0007] hi still additional or alternative embodiments, a system described here includes a master hot. The master hot is configured to perform operations including: accessing an input utterance as a user input; generating an input feature vector from the input utterance; comparing, using a classifier model, the input feature vector to the multiple set representations of the training featme vectors; outputting an indication that the user input cannot he handled by skill hots, based on the input feature vector falling outside the multiple set representations. Each skill ho of the skill hots is configured to provide a dialog with a user.

{0008] hi still additional or alternative embodiments, a method described here includes accessing an input utterance. The method further includes converting the input utterance to an input feature vector. The method further includes determining that the input feature vector falls outside boundaries of clusters in a feature space by comparing the input featme vector to the boundaries of the clusters. The method further includes outputting an indication that the input utterance cannot he handled by the skill bote, based on the input featme vector failing outside the clusters in the feature space. Each skill hot of the skill hots is configured to provide a dialog with a user.

10009] hi still additional or alternative embodiments, a method described here includes accessing an input utterance. The method further includes converting the input utterance to an input feature vector. The method further includes determining that the input feature vector is not sufficiently similar to composite feature vectors by comparing the input feature vector to the composite feature vectors. The composite feature vectors correspond to conversation categories. The method further includes outputting an indication that the input utterance cannot be handled by skill hois, based on the input feature vector Hot being sufficiently similar to the composite feature vectors. Each skill hot of the skill hots is configured to provide a dialog with a user.

[0010] In still additional or alternative embodiments, a method described here includes accessing an input utterance as a user input. The method further inclndes generating an input feature vector from the input utterance. The method further includes comparing, usrng a classifier model, the input feature vector to multiple set representations of the training feature vectors. The method further includes outputting an indication that the user input cannot be handled by skill hots, based on the input feature vector falling outside the multiple set representations. Each skill bot of the skill hots is configured to provide a dialog with a user.

[0011] In still additional or alternative embodiments, a method described here is used for trainin a classifier model. The method includes accessing training utterances associated with skill hots, the training utterances including respective training utterances associated with each skill bot of the skill hots. Each skill bot of the skill hots is configured to provide a dialog with a user. The method further includes generating training feature vectors from the training utterances, the training feature vectors including respective training feature vectors associated with each skill bot of the stall bots. The method further includes generating multiple set representations of the training feature vectors. Each set representation of the multiple set representations corresponds to a subset of the training feature vectors. The method further includes configuring the classifier model to compare input feature vectors to the multiple set representations of the training feature vectors.

[0012] In still additional or alternative embodiments a method described here is used for generating clusters that can be used feu determining whether an input utterance can be handled by skill bots. The method includes accessing, by a computer system, training utterances associated with skill bots, the training utterances including a respective subset of training utterances for each skill best of the skill bots. Each skill bot of the skill bots is configured to provide a dialog with a riser. Tire method further includes generating training feature vectors from the training utterances, the training feature vectors including a respective training feature vector for each training utterance of the training utterances. Tire method further includes determining centroid locations for clusters in a feature space. The method further includes assigning each training feature vector, of the framing feature vectors, to a respective cluster, of the clusters, having a respective centroid location to which the training feature vector is closest from among the clusters; and repeatedly modifying the clusters unlit a stopping condition is met. Modifying the clusters includes increasing a count of the clusters to an updated count; determining new centroid locations for the clusters in a quantity equal to the update count; and reassigning the training feature vectors to the clusters based on closeness to th new centroid locations. The method further includes determining boundaries of the clusters, tire boundaries including a respective boundary for each cluster of the clusters.

10013] hi still additional or alternative embodiments, a method described here is used for generating composite feature vectors that can he used for determining whether an input utterance can be handled by skill bots, The method frirther includes accessing, by a computer system, training utterances associated with skill bots, the training utterances including a respective subset of training utterances for each skill bot of the skid bots. Each skill hot of the skill bots is configured to provide a dialog with a riser. The method frirther includes generating training feature vectors from the training utterances, the training feature vectors including a respective training feature vector for each training utterance of the training utterances. Tire method further includes dividing tire training utterances into conversation categories. The method frirther includes generating composite feature vectors corresponding to the conversation categories. Generating the composite feature vectors includes, for each conversation category of the conversation categories, generating a respective composite featur e vector as an aggregate of respective training feature vectors of die training utterances in the conversation category.

[0014] hi still additional or alternative embodiments, a system described here includes means for performing any of the above mentioned methods.

[0015] The foregoing, together with other features and embodiments will become more apparent upon referring to the following specification, claims, and accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0016] FIG. 1 is a block diagram of an environment including a master bot in communication with various skill bots, also referred to as chatbots, according to some embodiments described herein.

[0017] FIG. 2 is a flow diagram of a method configuring and using a master bot to direct input utterances to skill bots and, when applicable, to determine that certain input utterances are unrelated to available skill bots, according to some embodiments described herein. |0018} FIG. 3 is a block diagram of a master hot configured to determine whether an input utterance is unrelated to the available skill hots, according to certain embodiments described herein. pOlSj FIG. 4 is a diagram of a skill bof according to certain embodiments described herein.

111020] FIG. 5 is a flow diagram of a method- of initializing a classifier model of a master bot to determine whether input utterances are unrelated to the available skill bots, according to some embodiments described herein.

[0021] FIG. 6 illustrates generation of training feature vec tors from training utterances used to train the classifier model, according to some embodiments described herein.

[0022] FIG. 7 is a flow diagram of an example of a method of using the classifier model of the master bot to detemiine whether an input utterance is unrelated to any available skill bot associated with the master hot, according to some embodiments described herein.

[0023] FIG. 8 is a flow diagram of another example of a method of using the classifier model of the master hot to determine whether the input utterance is unrelated to any available skill hot associated with the master bot, according to some embodiments described herein.

[0024] FIG. 9 illustrates an example of a feature space that includes points representing feature vectors of example utterances, according to some embodiments described herein.

[0025] FIG. 10 illustrates an example of the feature space of FIG. 9 having a class boundary between intent classes of the feature vectors of the example utterances, according to some embodiments described herein.

[0026] FIG. 11 illustrates an example of the feature space of FIG. 9 having class boundaries separating feature vectors associated with common intents into respective clusters, according to some embodiments described herein.

[0027] FIG. 12 illustrates an example of the feature space of FIG. 9 havin overlapping class boundaries separating feature vectors associated with common intents into respective clusters, according to some embodiments described herein.

[0028] FIG. 13 illustrates another example of the feature space of FIG. 9 having class boundaries separ ating featur e vectors into clusters, according to some embodiments described herein. [0029] FIG. 14 is a llow diagram of a method of initializing the classifier model of the master hot to utilize clusters to determine whether input utterances are unrelated to available skill bets, according to some embo iments described herein.

[6030] FIG. 15 illustrates an example of execution of certain aspects of the method of FIG. 14, according to some embodiments described herein.

[11031] FIG. IS is a flow diagram of a method of using the classifier model of the master bot to determine whether the input utterance is unrelated to any available skill hot associated with the master hot, according to some embodiments described herein.

[0032] FIG. 17 illustrates an example of executing this metho in a case where an input feature vector falls outside all the cluster boundaries, according to some embodiments described herein.

[0033] FIG. 18 is a flow diagram of a method of initializing the classifier model of the master bot to utilize composite feature vectors to determine whether input utterances are unrelated to available skill bots, according to some embodiments described herein.

[0034] FIG. 19 illustrates generation of composite feature vectors using intent-based conversation categories, according to some embodiments described herein.

[0035] FIG. 20 illustrates tire generation of composite feature vectors using bot-based conversation categories, according to some embodiments described herein.

[0036] FIG. 21 is a flow diagram of a method of using the classifier model of the master bot to determine whether the input utterance is unrelated to any available skill bot associated with the master bot, according to some embodiments described herein.

[0037] FIG. 22 is a flow diagram of an example of a method of selecting a skill bot to handle the input utterance, according to some embodiments described herein.

[0038] FIG. 23 is a diagram of another example of a method of selecting a skill bot to handle the input utterance according to some embodiments described herein.

[0039] FIG. 24 is a diagram of a distributed system for implementing some embodiments described herein.

[0040] FIG. 25 is a diagram of a cloud-based system environment in which various chatbot-related services may be offered as cloud services, according to some embodiments described herein. [0041] FIG. 26 is a diagram of aii example of a computer system that may be used to implement some embodiments described herein.

DETAILED DESCRIPTION

[0042] In the following description, for fee purposes of explanation, specific details are set forth in order to provide a thorough understanding of certai embodiments. However, it will be apparent that various embodiments may b practice without these specific details. The figures and description are not intended to be restrictive. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or designs.

[0043] Various embodiments are described herein, including methods, systems, non- iraosiiory computer-readable storage media storing programs, code, or instructions executable by one or more processors, and the like.

[0044] As described above, certain environments include multiple chatbots, also referred to herein as skill bots, such that each chatbot is specialized to handle a respective set of tasks, or skills. In that case, it would be advantageous to automatically direct user input from a user to the chatbot that is most suited for handling that user input and, further, to quickly identify when a user input is unrelated to the chatbots and thus cannot be handled by any of the available chatbots. Some embodiments described herein enable a computing system, such as a master bot, to perform preliminary processing on user input to determine whether the user input is unrelated to the chatbots (i.e., cannot be handled by the chatbots). As a result, computing resources are not wasted by a chatbot in attempting to process an unrelated user input. Thus, certain embodiments described herein conserve computing resources of chatbots to providing early detection of unrelated user inputs so as to prevent chatbots hi an environment from processing such user inputs. Further, certain embodiments conserve network resources by preventing the master bot from transmitting such user inputs to one or more chatbots that are incapable of handling them.

[0045] hi an environment that includes multiple chatbots, a master bot might include a classifier that determines which chatbo should process an input utterance. For instance, the classifier could be implemented as a neural network, or some other ML model, that outputs a set of probabilities, each probability associated wife a chatbot and indicating a confidence level that the chatbot can handle the inpu utterances. This type of master bot selects the chatbot with the highest confidence level and forwards tlxe input utterance to that chatbot However, the master hot might provide falsely high confidence scores for input utterances that are unrelated because the confidence is essentially divided among the available chatbot without consideratio for whether the chatbot are configured to handle the input utterance at all. This could result hi the chatbot processing an input, utterance that fhe chatbot is not equipped to handle. Tire chatbot may eventually provide au indication that it cannot handle the input utterance or a request to the user for clarification; however, this would occur after the chatbot has expended resources to process tire input utterance.

|0046] Alternatively, a developer could train the master hot in an attempt to help the master hot recognize utterances that cannot be handled by any available chatbot. For instance, the master ho could be trained with labeled training data that includes related uterances (i.e., that can be handled by tire chatbots) as well as unrelated utterances (i.e., that cannot be handled by the chatbots), to teach the master hot to recognize unrelated utterances. However, it is unlikely that the scope of training data could be sufficient to enable the master hot to recognize all unrelated utterances. An input utterance tha is not similar to any of the unrelated utterances used during training might still end up being forwarded to a chatbot for processing. As a result, the computational resources tor training the master hot would be increased, and the master hot would still fail at recognizing a broad swath of unrelated utterances.

[0047] Certain embodiments described herein address die drawbacks of the techniques described above and may be used instead of or in conjunction with such techniques . In certain embodiments, a determination is made, using a trained machine-learning ML·} model, referred to herein as a classifier model, as to whether an input uterance (Le_, a natural language phrase potentially in textual form) provided as user input is related to any skill hot in a set of available skill hots. This determination can beperformed, for instance, by a master bot tha utilizes the classifier model. The master hot generates an input feature vector from the input utterance. In one example, the classifier model of the master bot compares the feature vector to a set of clusters of training feature vectors, which are feature vectors of training data, to determine whether the input feature vector falls into any of the clusters. If the input feature vector fails outside all the clusters, then the master bot decides that the input utterances is unrelated to the skill hots. In another example, the classifier model of the master bot compares the input feature vectors to a set of composite feature vectors, each composite feature vector representing one or multiple training feature vectors belonging to a respective category. If the input feature vector Is not sufficiently similar to any of tire Composite feature vectors, then the master hot decides that fee input utterances is unrelated to the skill hots. If the input «iterance is determined to fee ««related to any hot, then the input utterance may be deemed a “none class” and is not routed to my bed for handling. Instead, processing of the input utterance may end, or the master bot may prompt the user for clarification as to what the user intended.

10648] Has improved routing ma prevent computational re ources from being commit led to handle a query' that ultimately would not be handled effectively by the system. Tins may improve the overall responsiveness of the system. Additionally, user experience may be improved as the user may be guided into rephrasing or clarifying their query in a manner that is more readily handled by an appropriately trained specialist chatbot or set of chatbots. In addition to processing and network resources being saved, the average time for a user query to be handled may thus be reduced by this improved routing through the bot network.

{0049] If the input utterance is determined to be related to at least one bot, then the input utterance may undergo intent classification in which the intent that most closely matches the input uterance is determined in order to stall a conversation flow associated with that intent. For example, each intent of a skill bot may have associated with it a state machine that defines various conversation states for a conversation with the user. Intent classification can be performed at the individual bot level. For instance, each skill bot registered with a master bot may have its own classifier (e.g , an ML-based classifier) that is trained on predefined utterances associated with that particular bot. The input utterance may be input to the intent classifier of the bot that most closely relates to the input utterance for determining winch of the bat’s intents best matches the utterance.

{0050] This may result in an improve user experience as the user’s query may be rapidly routed to a selected skill hot or selected set of skill hots that may be better equipped to handle tire user’s query than a non-specialist generic bot or may handle the user’s query more rapidly than a non-specialist generic bot. The improved routing may additionally or alternatively result in fewer computational resources being used as the selected skill bot or selected set of skill hots may consume fewer processing resources when handling a user query than a non- specialist generic bot. Overview of

|005i] FIG. i is a block diagram of an environment including a master hot 114 in communication with various skill hots 116, also referred to as chatbots, according to some embodiments described herein, The environment includes a digital assistant builder platform (DABP) 102 that enables developers to program and deploy digital assistants (DAs) 106, or chatbot systems. A DA 106 includes or has access to a master bot. 114, which includes or has access to one or more skill hots 116, where each skill bot 116 is configured to provide one or more skills, or tasks to users in some embodiments, tire master bot 114 and die skill bots 116 ran on the DA 106 itself; alternatively, however, only the master bot 114 runs c die DA 106 and communicates with skill bots 116 running elsewhere (e.g., on another computing devices).

[0052] hr some embodiments, the DABP 102 can be used to program one or more DAs 106. For example, as shown in FIG. 1, a developer can use DABP 102 to create and deploy a digital assistant 106 for users to access. For example, the DABP 102 can be used by a bank to create one or more digital assistants for use by the bank’s customers. Hie same DABP 102 platform can be used by multiple enterprises to create digital assistants 106. As another example, an owner of a restaurant (e.g. a pizza shop) may use DABP 102 to create and deploy a digital assistant that enables customers of the restaurant to order food (e.g., order pizza). Additionally or alternatively, for example, the DABP 102 can be used by a developer to deploy one or more skill bots 116 such that tire erne or more skill bots 116 become accessible to a master bot 114 of an existing digital assistant 106.

[0053] Additionally or alternatively, in some embodiments, as described further below, the DABP 102 is configured to train a master bot 114 of a digital assistant to enable the master bot 114 to recognize when an input utterance is unrelated to any of the available skill bots 116.

10054] For purposes of this disclosure, a “digital assistant” is an entity that helps users of the digital assistant accomplish various tasks through natural language conversations. A digital assistant can be implemented using software only (e.g,, the digital assistant is a digital entity implemented using programs, code, or instructions executable by one or more processors), using hardware, or using a combination of hardware and software. A digital assistant can be embodied or implemented in various physical systems or devices that can include generic or specialized hardware, such as in a computer, a mobile phone, a watch, an appliance, a vehicle, and the like. A digital assistant is also sometimes referre to as a chatbot system. Accordingly, for purposes of this disclosure, the terms digital assistant and chatbot system are interchangeable.

[0655] hi some embodiments, a digital assistant 106 can he used to perforin various tasks via natural language-based conversations between fee digital assistant aid its users 108 As part of a conversation, a user may provide a user input 110 to the digi tal assistant 106, which may provide a response 112 to the user input 110 and may optionally perform one or more additional tasks related to the user input 110 (e.g., if the user input includes an instruction to perform a task). A conversation, or dialog, can include one or more user inputs 110 and responses 112. Via a conversation, a user can request one or more tasks to he performed by fee digital assistant 106, and fee digital assistant 106 is configured, in response, to perform fee user-requested tasks and respond with appropriate responses to the user.

[0056] A user input 110 can be in a natural language form, referred to as an utterance. An utterance can be in text form, such as when a user types a phrase, such as sentence, a question a text fragment, or even a single word, and provides fee text as input to digital assistant 106. In some embodiments, a user utterance can be in audio input or speech form, such as when a user speaks something that is provided as input to digital assistant 106, which may include or have access to a microphone for capturing such speech. An utterance is typically in a language spoken by the user. When an utterance is in the form of speech input, fee speech input may- be converted to a textual utterance in the same language, and the digital assistant 106 may process the text utterance as user input. Various speeeh-to-texi processing techniques may be used to convert a speech input to a text utterance. In some embodiments, the speech-to-text conversion is performed by digital assistant 106 itself but various implementations are within the scope of this disclosure. For purposes of tins disclosure, it is assumed that input utterances (i.e., «iterances provided as user input) are text utterances that have been provided directly by a user 108 of digital assistant 106 or are fee result of conversion of input speech utterances to text form. This, however, is not intended to be limiting or restrictive hi any maimer.

[0057] An utterance can be a fragment, a sentence, multiple sentences, one or more words, one or more questions, combinations of the aforementioned types, or fee like. In some embodiments, fee digital assistant 106, including its master bot 114 and skill hots 116, is configured to apply natural language understanding (NLU) techniques to an utterance to understand the meanin of the user input. As part of the NLU processing for an utterance, fee digital assistant 106 may perform processing to understand Hie meaning of tire uterance, Which involves i entifying one or more intents and one or more entities corresponding to the uttenpsce. Upon understanding the meaning of an utterance, the digital assistant 106, including its master hot 114 and skill hots 116, may perform one or more actions or operations responsive to the understood meaning or intents.

[005S] For example, a user inpnt may request a pizza to he ordered by providing an input utterance such as T want to order a pizza/’ Upon receiving such an utterance, the digital assistant 106 determines the meaning of the utterance and takes appropriate actions. The appropriate actions may involve, for example, responding to the user with questions requesting user input on the type of pizza the user desires to order, the size of the pizza, any toppings for die pizza, or the like. The responses 112 provided by the digital assistant 106 may also be in natural language form and typically in the same language as the input utterance. As part of generating these responses 112, the digital assistant 106 may perform natural language generation (NLG). For the user ordering a pizza, via the conversation between the user and die digital assistant 106, the digital assistant may guide the user to provide all the requisite information for the pizza order and then, at the end of the conversation, cause the pizza to he ordered lire digital assistant 106 may end the conversation by outputting information to the user indicating that the pizza has been ordered.

[0059] At a conceptual level, the digital assistant 106, along with its master hot 114 and associated skill bois 116, performs various processing in response to an utterance received from a user. In some embodiments, this processing involves a series or pipeline of processing steps including, for example, understanding the meaning of the input utterance (using NLU), determining an action to be performed in response to die utterance, causing the action to be performed when appropriate, generating a response to be output to the user responsive to the utterance, and outputting the response to the user. The NLU processing can include parsing tiie received input utterance to understand the structure and meaning of the utterance, refining and reforming the utterance to develop a form that is more easily parsed and understood (e.g., a logical form). Generating a response may include using NLG techniques. Thus, the natural language processing performed by a digital assistant 106 can include a combination of NLU and NLG processing.

[0060] The NLU processing performed by the digital assistant 106 can include various NLU processing such as sentence parsing (e.g., iokenizmg, lemmafizing, identifying pari-of- speecii tags, identifying named entities, generating dependency bees to represent a sentence structure, splitting a sentence of the utterance into clauses, analyzing individual clauses, resolving ana horas, performing chunking, or the life). I» certain embodiments, the NLU processing or portions thereof is performed by the digital assistant 106 itself. Additionally or aheinatively, the digital assistant 106 may use other resource to perform portions of the NLU processing. For exalte, the syntax and structure of as input utterance may be identified by processing the input utterance using a parser, a part- of-speeeh tagger, or a name entity recognizer separate from the digital assistant 106. fOfMI] While the various examples provided in this disclosure show utterances in the English language this is meant only as an example. In certain embodiments, the digital assistant 106 is additionally, or alternatively, capable of handling utterances in languages other than English. The digital assistant 106 may provide subsystems (e.g., components implementing NLU functionality} that are configured for perfonuing processing for different languages. These subsystems may be implemented as pluggable units that can be called rising service calls from an NLU core server, which may run on the digital assistant 106. This makes tire NLU processing flexible and extensible for each language, including allowing different orders of processing. A language pack may be provided for individual languages, where a language pack can register a list of subsystems that can be screed from the NLU core server.

[0062] hi some embodiments, the digital assistant 106 can be made available or accessible to its users 108 through a variety of different channels, such as via certain applications, via social media platforms, via various messaging services and applications (e.g., an instant messaging application), or other applications or channels. A single digital assistant can have several channels configured for it so that it can be ran on and be accessed by different sendees simultaneously. Additionally or alternatively, the digital assistant 106 may be implemented on a device that is local to a user and, thus may be a personal digital assistant for use by the user or other nearby users.

[0063] The digital assistant 106 may be associated with one or more skills. In certain embodiments, these skills are implemented through individual chatbots, referred to as skill bots 116, each of which is configured to interact with users to fulfill specific types of tasks, such as tracking inventory, submitting timecards, creating expense reports, ordering food, checking a bank account, making reservations, buying a widget, or other tasks. For example, for the embodiment depicted in FIG. 1 , the digital assistant 106 includes, or otherwise Iras access to, three skill bots 116, each implementing a particular skill or set of skills. However, various quantities of skill hots 116 could be supported by fee digital assistant 106. Each skill bot 116 may be implemented as hardware software, or a combination of both.

[0064] Each skill associated wife a digital assistant 106, and feus implemented as a skill hot 116, is configured to help a user of the digital assistant' 106 complete a task through a conversation with the user. The conversation can include : a combination of test or audio user inputs ] 10 provided by the user and responses 112 provided by fee skill hots 116 by way of the digital assistant 106. These responses 112 may be in the form of text or audio messages to the user or may be provided using simple user interface elements (e.g , select lists) that are presented to the user for the user to make selections.

[0065] There are various ways in which a skill bot 116 can be associated or added to a digital assistant. In some instances, a skill bot 116 can be developed by an enterprise and then added to a digital assistant 106 using the DABP 102, e.g., through a user interface provided by fee DABP 102 for registering the skill bot 116 with the digital assistant 106. In oilier instances, a skill bot 116 can be developed and created using the DABP 102 and then added to a digital assistant 106 using the DABP 102. In yet other instances, fee DABP 102 provides an online digital store (referred to as a “skills store”) that offers multiple skills directed to a wide range of tasks. The skills offered through fee skills store may also expose various cloud sendees. To add a skill to a digital assistant 106 using the DABP 102, a developer can access fee skills store via fee DABP 102, select a desired skill, and indicate feat the selected skill is to be added to the digital assistant 106. A skill from fee skills store can be added to a digital assistant as is or in a modified form. To add the skill, in fee form of a skill bot 116, is added to fee digital assistant 106. fee DABP 102 may configure the master bot 114 of fee digital assistant 106 to communicate with fee skill hot 116. Additionally, fee DABP 102 may configure the master bot 114 wife skill hot data enabling the master bot 114 to recognize utterances that fee skill bot 116 is capable of handling and, thus, updating data used by the master hot 114 to determine whether an input utterance is unrelated to any of the skill bots 116 of fee digital assistant 106. Activities to configure the master hot 114 in this manner are described in more detail below.

10066] Skill bots 116, and thus skills, useable with a digital assistant 106 can vary widely. For example, for a digital assistant 106 developed for an enterprise, fee master bot 114 of fee digital assistant may interface wife skill bots 116 wife specific functionalities, such as a CRM bot for performing functions related to customer relationship management (CRM), an ERP bot for performing functions related to enterprise resource planning (ERP), an HCM bot for performing functions related to human capital management (BCM). or others. Various other skills could also be available to a digital assistant and may d pend on tie intended use of the digital assistant 106

[0667] Various architectures may be used to implement a digital assistant 106. In certain embodiments, the digital assistant 106 may be implemente using a master-child paradigm or architecture. According to this paradigm, a digital assistant 106 acts as tire master hot 114, either by including the master hot 114 or by accessing the master hot 114, and interacts with one or more child bots that are skill bots 116. The skill bots 116 may or may not ran directly on the digital assistant 106. In the example depicted in FIG. 1, the digital assistant 106 includes (i.e., accesses and uses) a master hot 114 and three skill bots 116; however, over quantities of skill bots 116 can be used, and the quantity can vary over time as skills are added or removed from the digital assistant 106.

[0068] A digital assistant 106 implemented according to the master-child architecture enables risers of the digital assistant 106 to interact with multiple skill bots 116, and thus to utilize multiple skills that may be implemented separately, through a unified user interface, namely via the master bot 114. In some embodiments, when a riser engages with the digital assistant 106, the riser input is received by the master hot 114. lire master bot 114 then performs preliminary processing to determine the meaning of the input utterance acting as the user input, where die input utterance can be, for instance, the user input itself or a textual version of the user input. Tire master· bot 114 determines whether the input utterance is unrelated to the skill bots 116 available to it, which may be the case, for instance, if the input utterances requires a skill outside those of the skill bots 116. If the input utterance is unrelated to the skill bots 116, then tire master bot 114 may return to the user an indication that the input utterance is unrelated to the skill bots 116; for instance, the digital assistant 106 can ask the user for clarification or can report that the input utterance is not understood. However, if the master hot 114 identifies a suitable skill bot 116, master bot 114 may route tire input utterance, and thus the ongoin conversation, to that skill bot 116. This enables a user to converse with the digital assistant 106 having multiple skill bots 116 through a common interface.

[0069] Although the embodiment in FIG. 1 shows digital assistant 106 including a master bot 114 and skill bots 116, this is not intended to be limiting. In some embodiments, the digital assistant 106 can include various other components, such as other systems or subsystems, that provide functionalities of die digital assistant 106. These systems and subs stems may be implemented only ia software (e.g., as co s store oa a computer- readahle medium and executable by one or mors processors), in hardware, or in an implementation feat uses a combination of software and hardware.

[§070] In certain embodiments, fee master hot 114 is configured to be awar of fee available skit! hots 116 For instance., fee master hot 114 may have access to metadata feat identifies fee various available skill bots 116, and for each skill bat 116, fee capabilities of fee skill bot 116 including fee tasks feat can be performe by fee skill bot 116. Upon receiving a user request in the form of an input utterance, fee master bot 114 is configured to, from the multiple available skill bots 116, identify or predict a specific skill but 116 feat can best serve or handle fee user request or, in the alternative, determine feat fee input utterance is unrelated to any of fee skill bots 116. If it is determined feat a skill bot 116 can handle fee input utterance fee master bot 114 may route fee input utterance or at least a portion of fee input utterance, to feat skill bot 116 for fur ther handling. Control thus flows from the master bot 114 to the skill bots 116.

10071] In some embodiments, fee DABP 102 provides an infrastructure and various services and features that enable a developer user of DABP 102 to create a digital assistant 106 including one or more skill bots 116. In some instances, a skill bot 116 can be created by cloning an existing skill bot 116, for example, by cloning a skill bot 116 provided in a skills store. As previously indicated, fee DABP 102 can provide a skills store feat offers multiple skill bots 116 for performing various tasks. A user of fee DABP 102 can clone a skill bot 116 from fee skills store, and as needed, modifications or customizations may be made to the cloned skill bot 116. In some other instances, a developer user of the DABP 102 creates a skill bot 116 from scratch, such as by using tools and services offered by DABP 102.

|0072] In certain embodiments, at a high level, creating or customizing a skill bot 116 involves the following activities:

(1) Configuring settings for a new skill bot

(2) Configur in one or more intents for fee skill bot

(3 ) Configur in one or more entities for one or more intents

(4) Training the skill bot

(5) Creating a dialog flow for fee skill bot (6) Adding custom components to the skill hot as needed ( 7) Testing and deploying the skill bot Each of the above activities is briefly described below.

[0073] (I) Configuring settings for a new skill bot 116: Various settings may be configured for the skill hot 116. For example, a skill bot developer can specify one or more invocation names for the skill hot 116 being created. These invocation names, which serve as identifiers for the skill bot 116, can then be used by users of a digital assistant 106 to explicitly invoke the skill bot 116. For example, a user can include an invocation name in the user’s input utterance to explicitly invoke the corresponding skill bot 116.

[0074] (2) Configuring one or more intents and associated example utterances for the skill hot 116: The skill bot 116 designer specifies one or more intents, also referred to as chatbot intents, for a skill bot 116 being created. The skill bot 116 is then framed based upon these specified intents. These intents represent categories or classes that the skill bot 116 is trained to infer for input utterances. Upon receiving an utterance, a trained skill bot 116 infers (i.e., determines) an intent for the utterance, where the inferred intent is selected from the predefined set of intents used to train the skill bot 116. The skill bot 116 then takes an appropriate action responsive to an utterance based upon the intent inferred for that utterance. In some instances, the intents for a skill hot 116 represent tasks that the skill bot 116 ca perform for users of the digital assistant. Each intent is given an intent identifier or intent name. Feu example, for a skill bot 116 trained for a bank, die intents specified for the skill bot 116 may include “CfaeckBalanee,” “TraasferMoney,” “DeposifCheck,” or die like.

[0075] For each intent defined for a skill bot 116, the skill bot 116 designer may also provide one or more example utterances that are representative of and illustrate the intent. These example utterances are meant to represent utterances that a user may input to the skill bot 116 for drat intent. For example, for the CheckBa!anee intent, example utterances may include “What’s my savings account balance?”, “How much is in my checking account?”, “How much money do I have in my account,” and the like. Accordingly, various permutations of typical user utterances may be specified as example utterances tor an intent.

[0076] The intents and their associated example utterances, also referred to as training utterances, are used as training data to train the skill bot 116. Various different training techniques may be used. As a result of this framing, a predictive model is generated that is

IS configured to take ail utterance as input and output an intent inferred for the utterance. In some instances, input utterances provided as user input are input to an intent analysis engine (e.g., a rales-based or ML-based classifier execute by the skill hot 116), which is configured to use the trained model to predict or infer an intent for the input utterance. The skill hot 116 may then take one ox more actions based on the inferred latent.

10077] (3) Configuring entities for one or more intents of the skill hot 116: In some instances, additional context may be needed to enable the skill hot 116 to properly respond to an input utterance. For example, there may be situations where multiple input utterances resolve to the same intent in a skill hot 116. For instance, the utterances “What’s my savings account balance?” and “How much is in my checking account?” both resolve to the same CheckBalaiiee intent, but these utterances are different requests asking for different things. To clarify such requests, one or more entities can be added to an intent. Using an example of a banking skill hot 116, an entity called Accoent_Type, which defines values called “checking” and “saving” may enable Hie skill hot 116 to parse the user request and respond appropriately. In the above example, while the utterances resolve to the same intent, the value associated with the Account Type entity' is different for the two utterances. This enables the skill hot 116 to perform possibly different actions for the two utterances in spite of them resolving to the same intent. One or more entities can he specified for certain intents configured for the skill hot 116. Entities are thus used to add context to the intent itself. Entities help describe an intent more fully and enable Hie skill hot 116 to complete a user request.

[0078] In certain embodiments, there are two types of entities: (a) built-in entities, which may be provided by the DABP 102, and (2) custom entities that can be specified by a developer. Built-in entities are generic entities that can be used with a wide variety of skill hots 116. Examples of built-in entities include entities related to time, date, addresses, numbers, email addresses, duration, recurring time periods, currencies, phone numbers, uniform resource locators (URLs), or the like. Custom entities are use for more customized applications. For example, for a banking skill, an Account Type entity may be defined by the developer to support various banking transactions by checking the user input for keywords like checking, savings, or credit cards.

[0079] (4) Training the skill bot 116: A skill hot 116 is configured to receive user input in the form of utterances, parse or otherwise process the received user input, and identify or selec an intent that is relevant to the received user input. In some embodiments, as indicated above, the skill bot 116 has to be trained for this. In certain embodiments, a skill hot 116 is trained based on the intents configured for the skill bot. 116 and th example utterances (Le., training utterances) associated with the intents, so that the s il hot 116 can resolve an input trtierance to one of its configured intents. In certain embodiments, the skill bot 116 uses a predictive model that is trained using fee training data and allows the skill hot 116 t discern what users say or, hi some cases, are trying to say. The DABP 102 may provide various different tr aining techniques that can be used by a developer to train a skill hot 116, including various ML-based training techniques, ntles-based training techniques, or combinations thereof. In certain embodiments, a portion (e.g., 80%) of the training data is used to train a skill bot model and another portion (e.g., the remaining 20%) is used to test or verify the model. Once trained, the trained model, also referred to as fee trained skill bot 116, can then be used to handle and respond to input utterances. In certain cases, an input utterance may be a question feat requires only a single answer and no further conversation hi order to handle such situations, a quesfion-aiid-answer (Q&A) intent may be defined for a skill bot 116. In some embodiments, Q&A intents are created in a similar maimer as other intents, but the dialog flow for Q&A intents can be different from that for regular intents. For example, unlike for other intents, the dialog flow for a Q&A intent may not involve prompts for soliciting additional information (e.g., fee value for a particular entity) from the user.

[0080] (5) Creating a dialog flow for the skill bot 116: A dialog flow specified for a skill hot 116 describes how the skill bot 116 reacts as different intents for the skill bot 116 are resolved responsive to received user input 110. The dialog flow defines operations or actions that a skill bot 116 will take, such as how the skill bot 116 responds to user utterances, bow the skill bot 116 prompts users for input, and how the skill bot 116 returns data. A dialog flow is similar to a flowchart feat is followed by the skill bot 116. The skill bot 116 designer specifies a dialog flow using a language, such as markdown language. In certain embodiments, a version of YAML called OBotML may be used to specify a dialo flow for a skill bot 116. The dialog flow' definition for a skill bot 116 acts as a model for the conversation itself, one that lets fee skill hot 116 designer choreograph the interactions between a skill bot 116 and the users feat fee skill bot 116 services.

[0081] In certain embodiments, fee dialog flow definition for a skill bot 116 contains three sections, which are described below:

(a) a context section; (b) a default transitions section; and

(c) a states section. fOG82] Context section: Hie developer of th skill bot 116 can define variables that are used in a conversation flow in the contest section. Other variables that may he named in the context section include, for instance: variables for error handling, variables for built-in or custom entities, user variables that enable the skill bot 116 to recognize and persist user preferences, or the like.

|0083] Default transitions section: Transitions for a skill hot 116 can be defined in the dialog flow states section or in the default transitions section. The transitions defined in the default transition section act as a fallback and get triggered when there are no applicable transitions defined within a state or when the conditions: required to trigger a state transition cannot be met. The default transitions section can be used to define routing that allows the skill bot 116 to gracefully handle unexpected user actions.

10084] States section; A dialog flow and its related operations are defined- as a sequence of transitory states, winch manage the logic within the dialog flow. Each state node within a dialog flow definition names a component that provides the functionality needed at that point in the dialog. States are thus built around the components. A state contains component- specific properties and defines the transitions to other states that get triggered after the component executes.

10085] Special case scenarios may be handled using the states section. For example, there might be times when it is desirable to provide users the option to temporarily leave a first skill they are engaged with to do something in a second skill within the digital assistant 106. For example, if a user is engaged in a conversation with a shopping skill (e.g.. the user has made some selections for purchase), the user may want to jump to a banking skill (e.g., the user may want to ensure that the user has enough money for the purchase) and then return to the shopping skill to complete the user’s order. To address this the states section in the dialog flow definition of the first skill can be configured to initiate an interaction with the second different skill in the same digital assistant and then return to the original dialog flow.

10086] (6) Adding custom components to the skill bot 116: As described above, states specified in a dialog flow for a skill hot 116 name components that provide the functionality needed corresponding to the states. Components enable a skill bot 116 to perform functions. In certain embodiments, the DABP 102 provides a set of preeeufigured components for performing a wide range of functions. A developer ca select one of more of these preeonfigured components ami associate them with states in the dialog flow for a skill hot 116. The developer can also create custom or new components using toots provided by the DABP 102 and can associate th custom components with one or mor states in th dialog flow for a skill bet 116.

10087] (7) Testing and deploying the skill bot 1 6: The DABP 102 may provide .several features that enable the developer to tes a skill bot 116 being developed. Tire skill bot 116 can then be deployed and included in a digital assistant 106.

[QQ88] While the above describes how to create a skill bot 116, similar techniques may also be used to create a digital assistant 106 or a master bot 114. At the master bot level, or the digital assistant level, built-in system intents may be configured for the digital assistant 106. In some embodiments, these built-in system intents are used to identify general tasks that the master bot 114 can handle without invoking a skill hot 116. Examples of system intents defined for a master bot 114 include: (1) Exit: applies when the user signals the desire to exit the current conversation or context in the digital assistant; (2) Help: applies when the user asks tor help or orientation: and (3) Unresolvedlntent: applies to user input that does not match well with the Exit and Help intents. Tire master bot 114 may store information about tire one or more skill bots 116 associated with the digital assistant 106. This information enables the master bot 114 to select a particular· skill bot 116 for handling an utterance or. in the alternative . , to determine that the utterance is unrelated to any skill bot 116 of the digital assistant 106.

[QQ89] When a user inputs a phrase or utterance to the digital assistant 106, the master bot 114 is configured to perform processing to determine how to route the utterance and the related conversation. The master bot 114 determines this using a routing model, which can be rules-based, ML-based, or a combination thereof. The master bot 114 uses the routing model to determine whether the conversation corresponding to the utterance is to be routed to a particular skill bot 116 for handling is to be handled by the digital assistant 106 or master bot 114 itself per a built-in system intent, is to be handled as a different state in a current conversation flow, or is unrelated to any of the skill bots 116 associated with the digital assistant 106. 10090] In certain embodiments, as part of this processing master hot 114 determines if the input utterance explicitly identifies a skill bot 116 using its invocation name. If an invocation name is present in the input utterance, then it is treated as an explicit invocation of the skill bot 116 corresponding to the invocation name. In such a scenario, the master bot 11 may route the inpu utterance to tire explicitly invoked skill hot 116 for further handling. If there is no specific or explicit invocation, i certain embodiments; the master bot 114 evaluates the input utterance and computes confidence scores (e.g , using a logistic regression model) for the system intents and the skill bots 116 associated with tire digital assistant 106. The score computed for a skill bot 116 or system intent represents how likely the input utterance is representati ve of a task that the skill bot 116 is configured to perform or is representative of a system intent. Any system intent or skill bot 116 with an associated computed confidence score exceeding a threshold value may be selected as a candidate for farther evaluation. The master bot 114 then selects, from the identified candidates, a particular system intent or a skill bot 116 for fiirther handling of the input utterance. In certain embodiments, after one or more skill bots 116 are identified as candidates, the intents associated with those candidate skill bots 116 are evaluated, such as by using the framed model for each skill bot 116, and confidence scores are determined for each intent hi general, any intent that has a confidence score exceeding a threshold value (e.g., 70%) is treated as a candidate intent. If a particular skill bot 116 is selected, then the input utterance is routed to that skill bot 116 for further processing. If a system intent is selected, then one or more actions are performed by the master bot 114 itself according to the selected system intent.

|0O9i] Some embodiments of a master bot 114 described herein are configured not only to direct an input utterance to a suitable skill bot 116 if applicable, but also to determine when an input utterance is unrelated to the available skill bots 116, and unrelated input utterances prompt some indication of unrelatedness. FIG. 2 is a flow diagram of a method 200 of configuring and using a master bot 114 to direct input utterances to skill bots 116 and, when applicable, to determine that certain input utterances are unrelated to available skill bots 116, according to some embodiments described herein lire method 200 of FIG. 2 is a general overview of which various specific instances will be described in detail below.

[0692] The method 200 depicted in FIG. 2. as well as other methods described herein may he implemented in software (e.g., as code, instructions, or programs) executed by one or more processing units (e.g., processors or processor cores), in hardware, or hi combinations thereof. The software may be stored on a noil-transitory storage medium, such as on a memory device. This method 260 is intended p be illustrative and non-limiting. Although FIG. 2 depicts various activities occurring in ft particular sequence or order, this is not intended to be limiting. In certain embodiments, for instance, the activities may be performed in a different order or one or more activities of fee method 200 may be performed in parallel. In certain embodiments, th method 200 may be performed by a training system and by a. master bot 114.

[0093] As shown in FIG. 2, at block 205 of the method 200, a classifier model of a master bot 114 is initialized. In some embodiments, the classifier model is configured to determine whether an input utterance is unrelated to any available skill hots 116. Additionally, in some embodiments, the classifier model may also be configured to route the input utterance to a suitable skill bot 116 for processing if the input utterance is deemed relevant to a skill bot 116.

10094] hi some embodiments, initialization of the classifier model may include teaching the classifier mode! how to recognize unrelated input utterances. To this end, a training system, winch may be part of the DABP 102, may have access to training utterances (i.e., example utterances) for each skill hot 116. The training system may generate a respective training feature vector describing and representing each training uterances of the various skill bots 116 that ate or will he available to the master bot 114. Each training feature vector is a feature vector of the corresponding training utterances. The training system may then generate various set representations from the training feature vectors. Each set representation may he a representation of a collection of training feature vectors; for instance, such a collection may be a cluster or a composite feature vector as described in detail below. To initialize the classifier model, the training system may configure the classifier model to compare input vectors to the various set representations that represent collections of training feature vectors.

[0095] At block 210, the master bot 114 receives user input 110 in the form of an input utterance. For instance, a user may have typed the input utterance as a user input 110, or fee user may have spoken a user input 110, which the digital assistant 106 converted to an input utterances. The input utterance may include a user request to be handled by the digital assistant 106 and, thus, by the master bot 114.

[0096] At block 215, the master bot 114 utilizes the classifier model to determine whether the input utterance can he handled by any skill bot 116 associated with the master bot 114 in some embodiments, the classifier model of die master bot 114 generates an input feature vector describing and representing the input Utterance. The classifier model may compare Hie input feature vector to the set representations of the training feature vectors, and based on this comparison the classifier model determines whether fee input utterance is unrelated to any skill hots 116 available to the master hot 114 (he., skhl hots 116 whose training uterance ace represented in fee set representations).

Examples of Classification Assistant nr Skill Bot " Level

10097] FIG. 3 is a block diagram of a master bot 114, also referred to as a master bot (MB) system, according to certain embodiments described herein. The master bot 114 can be implemented in software only, hardware only, or a combination of hardwar e and software. In some embodiments, the master bot 114 includes a preprocessing subsystem 310, and a renting subsystem 320. The master bot 114 depicted in FIG. 3 Is merely an example of an arrangement of components in a master bot 114. One of ordinary skill in the art would recognize many possible variations, alternatives, and modifications. For example, in some implementations, the master bot 114 may have more or fewer systems or components than those shown in FIG. 3. may combine two or more subsystems, or may have a different configuration or arrangement of subsystems.

[0098] In some embodiments, the language processing subsystem 310 is configured to process a user input 110 provided by a user. Such processing can include, for instance, using automated speech recognition or some other tool to translate fee user input 110 to a textual input utterance 303 if the user input is in audio form or in some oilier form that is not text.

10099] hi some embodiments, the routing subsystem 320 is configured to determine (a) whether the input utterance 303 is unrelated to any available skill bot 116 and, (b) if the input utterance 303 is related to at least one skill bot 116, which skill bot 116 is most suitable to handle to the input utterance 303. In particular, the classifier model 324 of fee routing subsystem 320 may be rule-based or ML-based, or a combination (hereof, and may be configured to determine whether fee input utterance 303 is unrelated to any available skill bots 116, representative of a particular skill bot 116, or representative of a particular· intent that has been configured for a particular skill bot 116. For example, as discussed earlier, a skill bot 116 may be configured with one or more chatbot intents. Each chatbot intent can have its own dialog Sow and be associated with one or more tasks that the skill hot 116 can perform. Upon determining that the input utterance 303 is representative of a particular skill bot 116 or intent wife which fee particular skill foot 116 ha been configured, fee routing subsystem 320 can invoke fee particular skill bot 116 and communicate the input utterance 303 as input 335 for fee particular skill bot 116. However if an input utterance 303 is deemed unrelated to any available skill bot 116, then fee input uterance 303 is deemed to belong to a noire clas 316, which is a class of utterances that cannot be handled by the available skill bots. In that ease, fee digital assistant 106 may indicate to fee user feat fee iuput utterance 303 cannotbe handled,

[0100] In some embodiments, fee classifier model 324 can be implemented nsing a rule- based or ML-based model, or both. For instance, in some embodiments, the classifier model 324 may include a rules-based mode! trained on training data 354 feat includes examples utterances, such that the rales-based model is used to determine whether an input utterance is unrelated to any available skill bot 116 (i.e., any skill bot 116 associated wife fee digital assistant 106). Additionally or alternatively, hi some embodiments, fee classifier model 324 may include a neural network trained on the training data 354. The training data 354 may include, fur each skill bot 116, a corresponding set of example utterances (e.g., two or more example utterances for each intent wife which the skill bot 116 is configured). For instance, in the example of FIG. 3, the available skill bots 116 include a first skill bot 116a, a second skill bot 116b, and a third skill hot 116c, and fee training data 354 includes respective skill bot data for each such skill bot 116, including first skill bot data 358a corresponding to the first skill hot 116a, second skill bot data 358b corresponding to fee first skill bot 116b, and third skill hot data 358c corresponding to the first skill hot 116c. The skill bot data for a skill bot 116 may include training utterances (i.e., example utterances) feat are representative of utterances that can be handled by feat skill bot 116 in particular; thus, the training data 354 includes training utterances for each of the various skill bots 116. A training system 350, which may but need not be incorporated into the DABP 102, may use fee training data 354 to train the classifier model 324 of fee master bot 114 to perform these tasks.

[0101] hi some embodiments, fee training system 350 uses fee training data 354 to train the classifier model 324 to determine whether an input utterance is unrelated to any available skill best 116. Generally, the training system 350 may generate training feature vectors to describe and represent the training utterances in the training data 354. As described in detail below', the training system 350 may generate set representations, with each set representation being a representation of a collection of the training feature vectors. After training and during operation, fee classifier model 324 may compare an input utterance 303 to the set representations to determine whether th input utterance 303 is unrelated to any amiable skill bot 116.

[0102] In some embodiments, the classifier model 324 includes one or more sub-models, each sub-model configured to perform various tasks related to classifying the input utterance 303. Additionally or alternatively to tire above, for instance, the classifier model 324 may include an ML-niodel or other type of model ί configured to determine which skill hot 11 is most suitable for the input utterance in a case where the input utterance 303 is deemed related (i.e., not unrelated) to at least one available skill bot 116. In some embodiments, the classifier model 324 may determine the most suitable skill bot 116 while determining that the input utterance is related to at least one skill bot 116, and in that ease, no further determination need be made to identify the most suitable skill bot 116. Alternatively, however, the classifier model 324 may determine (e.g.. using a logistic regression model), for each skill bot 116. an associated confidence score indicating the likelihood that the skill bot 116 is most suitable for processing (i.e., is a best match for) the input utterance 303. To this end, a neural network included in the classifier model 324 may be trained by the training system to determine the likelihood that an input utterance 303 is representative of each skill bot 116 or of one of the intents that have been configured for the skill hots 116. For instance, the neural network may determine and output a respective confidence score associated with each skill bot 116, where a confidence score for a skill bot indicates the likelihood that the skill bot 116 can handle the input utterance 303 or is the best available skill bot 116 for handling the input utterance. Given the confidence scores, the routing subsystem 320 may then select the skill bot 116 associated with the highest confidence score and may route the input utterance 303 to that skill bot 116 for processing.

[0193] FIG. 4 is a diagram of a skill bot 116, also referred to as a skill bot system, according to certain embodiments described herein. An instance of the skill bot 116 shown in FIG. 4 can be used as a skill bot 116 in FIG. 1 and can be implemented in software only, hardware only, or a combination of hardwar e and software. As shown in FIG. 4, the skill bot 116 can include a bot classifier model 424, which determines an intent of the input utterance 303, and a conversation manager 430, which generates a response 435 based on the intent.

[0104] The bot classifier model 424 can be implemented using a ntles-based or ML-based model, or both, and can take as input the input utterance 303 routed to the skill bot 116 by the master bot 114. The bot classifier model 424 may access rales 452 and intents data 454 in a data store 450 on the skill bot 116 or otherwise accessible to the skill bot 116. For instance. the intents data 454 may include example utterances or other data for each latent, and the rales 452 may describe how to nse the intents data to determine an intent for the input utterance 301 The hot classifier model 424 may apply those rules to the input utterance 30.1 and the intents data 454 to determine an intent of the input utterance 303. piOSj Mor specifically, in some embodiments, the hot classifier model 424 ma operate in a similar manner to that fey which the classifier model 324 in FIG. 3 detenmnes a skill hot 116 to handle an input utterance 303. For instance, the hot classifier model 424 may determine (e.g., using a logistic regression model) and may assign a respective confidence level to each chatbot intent for which the skill bot 116 is configured, similar to the maimer in which the classifier model 324 may assign confidence levels to skill hots 116, such that a confidence level indicates the likelihood that a respective chatbot intent is the most likely to be applicable to the input utterance 303. The bot classifier model 424 may then select the intent having to whic the highest confidence level is assigned. In additional or alternative embodiments, the bot classifier model 424 determines an intent for the input utterance 303 by comparing an input feature vector describing the input utterance 303 to either or both of (a) composite feature vectors, including one composite feature vector to represent the training feature vectors for a respective intent, or (b) dusters of training feature vectors, with eac cluster representing an intent or multiple intents. The use of feature vectors in this manner will be described in more detail below.

[0106] Upon identifying an intent that the utterance 202 test represents, tire hot classifier model 424 may communicate an intent indication 422 (i e., an indication of the identified intent) to the conversation manager 430. In the embodiment of FIG. 4, the conversation manager 430 is shown as being local to the skill bot 116. However, tire conversation manager 430 can be shared with the master bot 114 or across multifile skill hots 116. Accordingly, in some embodiments, the conversation manager 43ft is local to a digital assistant or master bot 114.

[0107] hi response to receiving the intent indication 422, the conversation manager 430 may determine an appropriate response 435 to the utterance 202 For example, the response 435 could be an action or message specified in a dialog flow definition 455 configured for the skill bot 116 system 400, and the response 435 may be used as a DA responses 112 in the embodiment of FIG. 1. For instance, the data store 450 may include various dialo flow definitions, including a respective dialog flow definition for each intent, and the conversation manager 430 may access the dialog flow definition 455 based on identification of the intent corresponding to that dialog How definition 455. Based on the dialog flow definition 455. the conversation mana er 430 may determine a dialog flow state as being the next state to transition to according to he dialog flow definition 455. The conversation manager 430 may determine the response 435 base on thither processing of the input utterance 303. For example, if tire input utterance 303 is “Chech balance in savings,” then the conversation manager 430 may transition to a dialog flow state in which a dialog relating to the user’s savings account is presented to the user. The conversation manager 430 may transition to this state based on the intent indication 422 indicating that the identified intent is a “CheckBalanee” intent configured for the skill hot 116 and further based on recognizing that a value of “sa ving” has been extracted for an “Account_Type” entity.

Using Feature Vectors to Describe Utterances

[0108] As mentioned above, the classifier model 324 may use feature vectors when determining whether an input utterances if unrelated, or related, any available skill bots 116. For the purposes of tins disclosure, a feature vector is a vector, or set of coordinates, that describes the features of an utterance and, thus, describes the utterance itself. A feature vector describing an utterance may be used to represent the utterance in certain circumstances, as described herein.

[0109] The concept of a feature vector is based on the concept of word embeddings. Generally, word embedding is a type of language modeling wherein words are mapping to corresponding vectors. A particular word embedding may map words that semantically similar to similar regions of a vector space, such that similar words are close together in the vector space and dissimilar words are far apart. A simple example of a word embedding uses a “one hot” encoding, in which each word in a dictionary is mapped to a vector with a quantity of dimensions equal to the size of the dictionary, such that the vector has a value of 1 in a dimension corresponding to the word itself and a value of zero in all other dimensions. For example, the first two words of the sentence “Go intelligent bot service artificial intelligence, Oracle” could be represented using the following “one hot” encoding:

Feature vectors can be used to represents words, sentences, of various types of phrases. Given the above simple example of a word embedding, a corresponding feature vector might map a series of words, such as an utterance, to a feature vector that is an aggregate of the word embeddings of the words in the series. That aggregate may be, for instance, a sum, an average, or a weighted average.

[0110] For utterances that are different from each other, in that such utterances include words that have very different semantic meanings, the respective feature vectors may differ as well. However, utterances that ar e semantically similar, and th s include words that are the same across the utterances or are semantically similar across the utterances, may have similar feature vectors (i.e., feature vectors positioned close to each other in the vector space). Each feature vector corresponds to a single point within a vector space, also referred to as a feature space, where that point is tire origin of the feature space plus the feature vector. In some embodiments, points for uterances that are similar to each other semantically are located close to each other. Throughout this disclosure, a feature vector and its corresponding point will be referred to interchangeably because the two provide different visuals for the same information.

Example Method to Initialize a Classifier Model of a Master Bot

[0111] In some embodiments, the classifier model 324 of the master hot 114 utilizes feature vectors of training utterances as a basis for determining whether an input utterance is unrelated, or related, to any available skill bots 116. FIG. 5 is a diagram of a method 500 of initiali zing a classifier model 324 of a master bot 114 to perform this task, according to some embodiments described herein. For instance, this method 500 or similar may be per formed at block 205 of the method 200 for configuring and using a master bot 114 to route input utterances 303. FIG. 5 is a general method 500, for which instances that are more detailed ar e shown and described with reference to FIG. 14 and FIG. 18. [0112] The method 500 depicte in FIG 5, as well as other methods described herein, may lie implemented in software (e.g. as code, instructions or pro ms) executed by one or more processing units (e.g., processors or processor cores), in hardware, pr in combinations thereof. Th software may be stored on a non-transitory storage medium, such as on a. emory device. This method 500 is intended to he illustrative and non-limiting. Althongh FIG. 5 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting. In certain embodiments, for instance, the activities maybe performed in a different order, or one or more activities of the method 5O0 may be performed in parallel. In certain embodiments, the method 500 may be performed by a training system 350, which may be pari of a DABP 102.

[0113] At block 505, the training system 350 accesses training utterances, also referred to as example utterances for die various skill bots 116 associated with the master bot 114. For instance, the training utterances may be stored as skill bot data in the training data 354 accessible to the training system 350. In some embodiments, each skill bot 116 available to the master bot 114 may be associated with a subset of training utterances, and each subset of such training utterances may include training utterances for each intent for which the skill hot 116 is configured. Thus, the hill set of training utter ances may include training utterances for each intent of each skill bot 116, such that every intent of every skill bot 116 is represented.

[0114] At block 510, tire training system 350 generates training feature vectors from the training utterances accesse at block 505. As described above, the training utterances may include a subset associated with each skill bot 116 and, further, may include training utterances for each intent of each skill bot 116. Thus, hi some embodiments, the training feature vectors may include a respective subset for each skill bot 116 and, further, may include respective feature vectors for each intent of each skill bot.

[0115] FIG. 6 illustrates the generation of training feature vectors 620 horn training utterances 615, according to some embodiments described herein. Specifically, FIG. 6 relates to the training utterances 615 of skill bot data 358 in the training data 354, where the skill bot data 358 is associated with a particular skill bot 116. That skill bot 116 is configured to handle input utterances 303 of multiple intents, which include intent A aud Intent B aud thus, the training utterances 615 include training utterances 615 representative of Intent A as well as training utterances 615 representative of Intent B. |0I16} As described above the training system 350 may generate a training feature vector 620 to describe and represent each respective training utterance 615. Various techniques are known for converting a sequence of words, such as a training utterance 615, into a feature vector, such as a training feature vector 620, and one or more of snch, techniques may be used. For instance, the training system 350 may, but need not, use a care-hot encoding or some otlier encoding to encode each training «Iterance 61 as a corresponding training feature vector 620.

|0117] As also described above, a feature vector can be represented as a point In some embodiments, each training feature vector 620 can be represented as a point 640 in feature space 630, where the feature space has a number of dimensions equal to fire number of features (i.e., the number of dimensions) in the training feature vectors 620. In the example of FIG. 6, the two training feature vectors 620 representative of the same intent, specifically Intent A. are plotted as points 640 that are close together, due to these two training feature vectors 620 being semantically similar. However, it is not required that all training feature vectors 620, or all feature vectors, for a particular intent be represented as points that are close to one another.

[0118] Returning to FIG. 5, at block 515, the tiaining system 350 generates multiple set representations of the training feature vectors 620 generated at block 510, where each set representation represents a set of the training feature vectors 620. As described in detail below, a set representation can be, for example, a duster of training feature vectors 620 or a composite feature vector that is an aggregate of multiple training feature vectors 620. Essentially, a set representation may be a manner of representing multiple training feature vectors 620 that have been grouped together. Each set of training feature vectors represented in a corresponding set representation may shar e a common intent, a common group of intents, a common skill bot 116, or a common region of the ieatme space 630, or alternatively, the training feature vectors represented in a single set representation need not have any commonality other than being based on die training data 354. f0119] At block 520, the tiaining system 350 may configure the classifier model 324 to compare input utterances 303, provided as user inputs 110, to the various set representations. For instance, the set representations may be stored on a storage device that is accessible to the classifier model 324 of the master bot 114, and the classifier model 324 may be programmed with rules for how to determine whether an input utterances matches, or fails to match, the various set representations. The definition of matching may be dependent on the specific set representations being used as will he described below in more detail.

[0120] FIG. 7 is a diagram of a method 700 of using a classifier model 324 of a master hot 114 to determine whether an input utterance 303, provided as user input 110, is unrelated to any available skill hot 116 associated with the master hot 114, according to some embodiments described herein. Tins method 700 or similar may he performed after initialization of the classifier model 324 and, further, for each input utterance 303 received For instance, this method 700 or similar may be performed at block 215 of the method 200 for configuring and using a master hot 114 to route input utterances 303. FIG. 7 is a general method 700, for which instances that are more detailed are shown and described with reference to FIG. 16 and FIG. 21.

[0121] The method 700 depicted in FIG. 7, as well as other methods described herein, maybe implemented in software (e.g.. as code, instructions, or programs) executed by one or more processing unite (e.g. , processors or processor cores), in hardware, or in combinations thereof. The software may be stored on a non-transitory storage medium, such as on a memory device. This method 700 is intended to be illustrative an non-limiting. Although FIG. 7 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting. In certain embodiments, for instance, the activities may be performed in a different order or one or more activities of the method 700 may be performed in parallel. In certain embodiments die method 700 may be performed by a master hot 114.

[0122] At block 705 of the method 700, the master bot 114 accesses an input utterance 303 that has provided as user input 110. For instance, in some embodiments, a user may provide user input 110 in tire form of speech input, and the digital assistant 106 may convert that user input 110 into a textual input utterance 303 for use by die master bot 114.

[0123] At block 710, the master bot 114 may generate an input feature vector from the input utterance 303 accessed at block 705. The input feature vector may describe and represent the input utterance 303. Various techniques are known for converting a sequence of words such as an input utterance, into a feature vector, and one or more of such techniques may Ire used. For instance, die training system 350 may, but need not, use a one-hot encoding or some other encoding to encode the input utterances as a corresponding input feature vector. However, an embodiment of die master bot 114 uses the same technique as was used to generate training feature vectors 620 from training utterances when training tire classifier model 324.

[0124] At decision Mock 715, tire master bot 114 may cause fire classifier model 324 to compare file input feature vector generated at Mock 710 to the set representations determined at block 515 of tire method 500 for initializing the classifier model 324, an as such, the master bat 114 may deiemiine whether the input feature vector matches any skill bets 116 available to the master hot 114. The specific technique for comparing and matcMng may depend on the nature of the set representations. For instance, as will be described below, if the set representations are clusters of training feature vectors 620, the classifier model 324 may compare the input feature vector (i.e. , the point representing the input feature vector) to the clusters to determine whether the input feature vector falls inside any of the clusters and thus matches at least one skill boi 116; or if the set representations are composite vectors of training feature vectors, the classifier model 324 may compare the input feature vector to the composite feature vectors to determine whether the input feature vector is sufficiently similar to any such composite feature vector and thus matches at least one skill hot 116. Various implementations ar e possible and are within the scope of this disclosure.

[0125] If the input feature vector is deemed to match at least one skill bot 116 at decision block 715, then at block 720, the master bot 114 may route the input utterance to a skill bot 116 that the input feature vector is deemed to match. However, if the input feature vector is deemed not to match any skill bot 116 at decision block 715, then at block 725, the master bot 114 may indicate that the utterance cannot be processed by any skill bot 116. This indication may be passed to the digital assistant, which may provide an output to the user that indicates the user input 110 cannot be processed, or handled, by the digital assistant.

[0126] FIG. 8 is a diagr am of another example of a method 800 of using a classifier model 324 of a master hot 114 to determine whether an input utterance 303, provided as user input 110, is unrelated to any available skill bot 116 associated with the master hot 114, according to some embodiments described herein. This method 800 or similar may be performed after initialization of the classifier model 324 and, further, for each input utterance 303 received. For instance, this method S00 or similar may be performed at block 215 of the method 200 for configuring and using a master bot 114 to route input utterances 303 Like the method 700 of FIG. 7, the method 800 of FIG 8 is a general method 800, of which more detailed instances of certain method blocks are shown and described with reference to FIG. 16 and FIG. 21. In contrast to the method 700 of FIG. 7, however, this method 800 illustrates a preliminary filtering activity, at decision block 810 and block 815, winch can be used to ensure that certain input utterances 303 similar to the training «iterances 615 are not classified as belonging to th none class 316. In other words, this method 80(1 includes a filter that filters from none class consideration any input «iterances 303 deemed similar enough to the training utterances 615, thus ensuring that such input utterance 303 are routed to skill hots 116. f 0127] The method 800 depicted in FIG. 8, as well as other methods described herein, may be implemented in software (e.g., as code, instructions, or programs) executed by one or more processing units (e.g., processors or processor cores), in hardware, or in combinations thereof. The software may be stored on a non-transitory storage medium, such as on a memory device. This method 800 is intended to be illustrative and non-limiting. Although FIG. 8 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting hr certain embodiments, for instance, the activities may be performed in a different order, or one or more activities of the method 800 may be performed in parallel In certain embodiments, the method 800 may be performed by a master hot 114.

[0128] At block 805 of the metho 800, the master hot 114 accesses an input utterance 303 that has provided as user input 110. For instance, in some embodiments, a user may provide user input 110 in the form of speech input, and the digital assistant 106 may convert that user input 110 into a textual input utterance 303 for use by the master hot 114.

[0129] At decision block 810, the master hot 114 may cause the classifier model 324 to determine whether all words, or a predetermined percentage of words, in the input utterances 303 accessed at block 805 are found in the teaming utterances 615. For instance, in some embodiments, the classifier model 324 utilizes a Bloom filter base on the words of the training utterances 615 and applies this Bloom filter to the input utterance 303.

[0130] If the input utterance 303 includes only words that are found in the training utterances 615 at decision block 810, then the master hot 114 may determine that input uterance is related to at least one skill bot 116 and thus does not belong to the none class 316. In that case, at block 815, the master bot 114 may route the input utterance 303 to the skill bot 116 most closely matching the input utterance 303. For instance, as described above, the classifier model 324 may be configured assign a confidence score (e.g., using a logistic regression model) to each skill bot 116 for the input utterance. The master bot 114 may select the skill bot 116 having the highestconfidence score and may rout the input utterance 303 to that skill hot 116.

[0131] However, if at decision Mock §10 the input utterance 303 includes any words not i the training uterances 615, or includes a percentage of words greate than a threshold, then the method 800 proceeds to Mock 820. At block 820, Hie master hot 114 may generate an input feature vector rom the input utterance 303 accessed at block 805. The input feature vector may describe and represent the input utterance 303. Various techniques are known for converting a sequence of words, such as an input utterance, into a feature vector, and one or more of such techniques may be used. For instance, the training system 350 may, but need not, use a one-hot encoding or some other encoding to encode each training uterance 615 as a corresponding training feature vector 620. However, an embodiment of the master bot 114 uses the same technique as was used to generate training feature vectors 620 from training utterances. f0132] At decision block 825, the master bot 114 may cause the classifier model 324 to compare the input feature vector generate at block 820 to the set representations determined at block 515 of the method 500 for initializing the classifier model 324, and as such, the master bot 114 may determine whether the input feature vector matches any stall hots 116 available to the master hot 114. Tire specific technique for comparing and matching may depend on the nature of the set representations. For instance, as will be described below, if tire set representations are clusters of training feature vectors 620, the classifier model 324 may compare the input feature vector (i.e., the point representing the input feature vector) to the clusters to determine whether the input feature vector falls inside any of the clusters and thus matches at least one skill bot 116; or if tire set representations are composite vectors of training feature vectors, the classifier model 324 may compare the input feature vector to the composite feature vectors to determine whether the input feature vector is sufficiently similar to any such composite feature vector and thus matches at least one skill bot 116 Various implementations are possible and are within the scope of this disclosure. f0133] If the input feature vector is deemed to match at least one skill bot 116 at decision block 825, then at block 830, the master bot 114 may route the input utterance to a skill bot 116 that the input feature vector is deemed to match. However, if tire input feature vector is deemed not to match any skill bot 116 at decision block 825, then at block 835, the master bot 114 may indicate that the utterance cannot be processed by any skill bot 116. This indication may be passe to the digital assistant, which may provide an output to the user that indicates the user input 110 cannot be processed, or handled, by the digital assistant.

Examples of Types of Clusters Useable by the Classifier Model

[0134} As mentioned above, an example type of set representations that can be used by the classifier model 324 are dusters of training feature vectors 620 In general, it can be assumed tfiat an input utterance drat can be bandied- by an available skill bot 116, and i tints related to tiie available skill bots 116, lias some semantic similarity to training utterances 615 for those skill bots 116 Thus, an input utterance 303 that is related to the available skill bots 116 likely can be represented as an input feature vector that is proximate to one or more training utterances 615 in the feature space 630.

[0135] Given the proximity of feature vectors of semantically similar utterances, a boundary can be defined to demarcate feature vectors, plotted at points, having a common intent or to separate feature vectors have different intents. In two-dimensional space, the boundary may be a line or a circle, such that points that fall on one side of such as line belong to a first intent class (i.e., correspond to utterances having a first intent) and points that fall on the other side of the line belong to a second intent class. In three dimensions, the boundary can be represented as a plane or a sphere. More generally, in various dimensions, the boundary' can be a hyperplane, a hypersphere, or hypervolume. Hie boundary' can have various shapes and need not be completely spherical or symmetrical.

[0136] FIG. 9 illustr ates an example of a feature space 630 that includes points repr esenting featur e vectors of example utterances, according to some embodiments described herein. In this example, some of the example utterances belong to a balance class and are representative of a first intent relating to requests for account balance information, and the rest of the example utterances belong to a transactions class and are representative a second intent relating to requests for information about transactions. In FIG. 9, example utterances in lire balance class are labeled with a b, and example utterances in the transactions class are labeled with a t. These intent classes could be defined, for example, for a finance-related skill bot 116.

|0137] For example, the example utterances in the following table may belong to the balance class, in the first column, and the transactions class, in the second column;

[0138] FIG. 10 illustrates an example of the feature space 630 of FIG. 9 having a class boundary 1010 between intent classes of the feature vectors of the example utterances,, according to some embodiments describe herein. Specifically, as shown in FIG. 10, to separate points (i.e., feature vectors of example utterances) in the balance class from those in the transactions class, a hue could be drawn as a class boundary 1010. This class boundary 1010 is a rough approximation of a division between the two intent classes. Creating a more precise boundary 1010 between the intent classes may involve providing circles or other geometric volumes that define clusters of feature vectors, such tha each cluster includes only feature vectors within a single corresponding intent class (i.e., having the intent associated with that intent class).

[0139] FIG. 11 illustrates an example of the feature space 630 of FIG. 9 having class boundaries 1010 separating, specifically isolating, feature vectors associated with common intents into respective clusters, according to some embodiments described herein. In this example, as in some embodiments, not all feature vectors of an intent class are within a single cluster, but no cluster includes feature vectors from more than a single intent class. Specifically, in the example shown, a first cluster 1110a defined by a first class boundary 1010 includes only feature vectors in tire balance class, and a second cluster 1110b defined by a second class boundary 1010 includes only feature vectors in the transactions class. As describe ia detail below some embodiments described herein can form class boundaries 1010 to create clusters such as those shows in FIG; 11.

|0140] FIG. 12 illustrate another example of the feature space 630 of FIG. 9 having class boundaries 1010 separating, specifically isolating, feature vectors associated with common intents into respective clusters, according to some embodiments described herein. In this example, as hi some embodiments, not all feature vectors of an intent class are within a single cluster, but no cluster includes feature vectors from more than a single intent class. Specifically, in the example shown, a first cluster 1110c defined by a first class boundary 1010 includes only feature vectors in the balance class, and a second cluster lllOd defined by a second class boundary 1010 includes only feature vectors in the transactions class. In contrast to the example of FIG. 11, however, the class boundaries 1010 and thus the clusters are overlapping. Some embodiments described herein support overlapping clusters as shown in FIG. 12 As described in detail below, some embodiments described herein can form class boundaries 1010 to create clusters such as those shown in FIG. 12.

|014i] FIG. 13 illustrates another example of the feature space 630 of FIG. 9 having class boundaries 1010 separatin feature vectors into clusters, according to some embodiments described herein hi this example, as in some embodiments, not all feature vectors of an intent class are within a single cluster, and further, a cluster may represent various intents by including feature vectors with varying intents. Specifically, in tire example shown, a first duster lllOe defined by a first class boundary 1010 includes only feature vectors in the balance class, and a second cluster lllOf defined by a second class boundary 1010 includes some feature vectors in the balance class (i.e., having a balance-related intent) and some feature vectors in the transactions class (i.e., having a trarrsaetions-related intent). Some embodiments described herein support clusters with varying intents as shown in FIG. 13. As described hi detail below, some embodiments described herein can form class boundaries 1010 to create clusters such as those shown in FIG. 13.

Clustering to Identify Unrelated input Utterances

[0142] FIG. 14 is a diagram of a method 1400 of initializing a classifier model 324 of a master hot 114 to utilize clusters of training feature vectors to determine whether input utterances are unrelated, or related, to available skill bots 116, according to some embodiments described herein. This metho 1400 or similar can be used at block 205 of the above method 200 of configuring and using a master hot 114 to direct input utter ances 303 to skill bots 116, an further the method 1400 of FIG. 14 i a more specific variation of the method 500 of FIG. 5. In .some embodiments of this method, as described below, fc-means clustering is peribrmed configure the classifier model 324. Although k-meaBS clustering may enable mor accurate formation of clusters as compared to other clustering techniques, such as k-aemest rieiglibors, various clustering teebakpes may he used instead of or hi addition to k-means clustering.

|0143] The method 1400 depicted in FIG. 14, as well as other methods described herein, may be implemented in software (e.g., as code, instructions, or programs) executed by one or more processing units (e.g., processors or processor cores), in hardware, or in combinations thereof lire software may be stored on a non-transitory storage medium, such as on a memory device. This method 1400 is intended to be illustrative and non-limiting. Although FIG. 14 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting. In certain embodiments, for instance, the activities may be performed in a different order, or one or more activities of the method 1400 may be performed in parallel. In certain embodiments, the method 1400 may be performed by a training system 350, which may be part of a DABP 102.

[0144] At block 1405 of the method 1400, the training system 350 accesses training utterances 615, also referred to as example utterances, for the various skill bots 116 associated with the master bot 114. For instance, the training utterances 615 may be stored as skill bot data in the training data 354 accessible to the training system 350. In some embodiments, each skill bot 116 available to the master hot 114 may he associated with a subset of the training utterances 615, and each such subset of the training utterances 615 may include training utterances 615 for each intent for which the skill bot 116 Is configur ed. Thus, the fill! set of training utterances 615 may include training utterances 615 for each intent of each skill bot 116, such that every·' intent of every skill hot 116 is represented.

[0145] FIG. 15 illustrates an example of execution of aspects of tire method 1400 of FIG. 14, according to some embodiments described herein. As shown in FIG. 15, tire training system 350 may access training data 354, which may include skill bot data related to the skill bots 116 available to the master bot 114 whose classifier model 324 is being trained. In this example, the training data 354 accessed by' the training system 350 may include training utterances 615 from first skill hot data 358d, second skill bot data 358e, and third skill bot data 358f, each of which may include training uterances 615 representative of a respective skill bot 116 available to the master bot 114. Further, for a given set of skill bot data, as shows for the first skill hot data 358 hi FIG. IS, each training utterance 615 nmy he associated with an intent that ike associated skill hot 116 is configured to evaluate and handle.

[8146] At block 1410 of FIG. 14, fee training system 350 generates training feature vectors 620 fiom fee trainin utterances 615 accessed at block 1405. As described above, tlie training utterances 615 may include a subset associated wife each skill hot 116 and, further, may include trainin ntteranees 615 for each intent of each skill hot 116. Tims, in some embodiments, the training feature vectors 620 may include a respective subset for each skill bot 116 and, further, may include respective feature vectors feu each intent of each skill bot. As described above, the training system 350 may generate a training feature vector 620 to describe and represent each respective training utterance 615. Various techniques ar e known tor converting a sequence of words, such as a training utterance 615, into a feature vector, such as a training feature vector 620, and one or more of such techniques may be used. For instance, fee training system 350 may, but need not, use a one-hot encoding or some other encoding to encode each training utterance 615 as a corresponding training feature vector 620.

[0147] As shown hi FIG. 15, for example, each training utterance 615 fi om the various skill hot data 358 may be converted into a respective training feature vector 620. hi some embodiments, the resulting training feature vectors 620 may thus be representative of all fee skill bots 116 for winch training utterances 615 were provided and for all fee intents for which training utterances 615 were provided.

[0148] At block 1415 of FIG. 14, the training system 350 may set (i.e., initialize) a count, which is a quantity of clusters to be genera ted. In some embodiments, for example, the count may initially be set to fee quantity where n is the total number of intents across fee various skill bots 116 that are available to the master bot 114. However, various other quantities can be used as the initial value of fee count.

[0149] hi some embodiments, the training system 350 performs one, two, or more rounds of iteratively determining a set of clusters of the training feature vectors 620 Although this method 1400 utilizes two rounds of determining clusters, a single round or a greater number of rounds may be used in some embodiments. At block 1420, the training system 350 begins a first round of determining the dusters. The first round utilizes an iterative loop, in which the trainhig system 350 generates clusters of the training feature vectors 620 and then determines the sufficiency of such clusters. As described below, when the clusters are deemed sufficient the training system 350 may then terminate the iterations in the first round.

10150] In some eaibodinients, Mock 1425 is the beginning of the iterative loop of the first round for determining clusters. Specifically, at block 1425, the training system 3509 may determine respective centroid locations (i.e., a respective location for each centroid) for the Various clusters to be generated in this iteration; the quantity of centroid locations is equal to the count determined at block 1415. hi some embodiments, in the first iteration of this loop, the training system 350 may choose a set of randomly selected centroid locations, having quantity equal to tire count, within a feature space 630 into which the training utterances 615 fall. More specifically, for instance, the training system 350 may determine a bounding box, such as a minimum bounding box, for the points corresponding to the training feature vectors 620. The training system 350 may then randomly select a set of centroid locations within that bounding box, where the number of locations selected is equal to the count determined for the clusters. During iterations other than the first one, the count has increased since the previous iteration, and thus, in some embodiments, only the centroid locations of centroids being newly added are determined randomly. The centroids being carried over the previous iteration may retain their locations. A respective centroid for a corresponding cluster may be positioned at each centroid location.

[0151] At block 1430, the training system 350 may determine clusters by assigning each training feature vector 620 to its respective closest centroid from among the various centroids whose locations were determined at block 1425. For instance, for each training feature vector 620, the training system 350 may compute the distance to each centroid location and may assign that training feature vector 620 to the centroid having the smallest such distance. The set of training feature vectors 620 assigned to a common centroid may together form a cluster associated with that centr oid; thus, there may be a quantity of clusters equal to the quantity of centroids, which is the value of the count.

[0152] hi the example of FIG. 15, only a portion of the feature space 630 is shown, and a first centroid 1510a and a second centroid 1510b are visible within that portion. Of the three training feature vectors 620 shown, the training system 350 determined that two were closest to the first centroid 1510a and one was closest to the second centroid 1510b. In this example, the two training feature vectors 620 closest to the first centroid 1510a make up a first cluster 11 lOg, and the one training feature vector 620 closest to the second centroid 1510b makes up a second cluster 11 IGh. [0153] At block 1435 of FIG. 14, for fee clusters determined at block 1430, fee training system 350 recomputes fee location of each cluster s centroid. For instance, m some embodiments, the centroid of each cluster is computed to he fee average (e.g., fee arithmetic mean) of fee training feature vectors 620 assigned to that centroid and, feus, assigned to featcluster.

[0154] At decision Mock 1440, fee training system 350 may determine whether a stopping condition is satisfied. In some embodiments, fee training system 350 may utilize tins method 1400 or similar repeatedly increase fee quantity of centroids (ie., increase the count) and assign fee training feature vectors 620 to their closest respect centroids, until a convergence occurs such that no significant improvement in clustering is likely to occur hi some embodiments, the stopping condition defines fee level of convergence that is sufficient.

[0155] hi some embodiments the stopping condition may be satisfied if one or both of fee following conditions are true: (1) fee average cluster cost satisfies a first threshold, such as fee average cluster cost being less than 1.5 or some other predetermined value: or (2) fee outlier ratio satisfies a second threshold, such as the outlier ratio being less than or equal to 0.25 or some other predetermined value. A cluster cost for a particular cluster may be defined as (a) fee sum of fee squared distances between the cluster’s centroid, as recomputed in block 1435, and each training feature vector 620 assigned to feat centroid (b) divided by fee quantity of training feature vectors 620 assigned to that centroid. Thus, fee average cluster cost may be fee average of fee various cluster costs of the various centroids. The outlier ratio may be defined as the total number of outliers among the cluster s, divided by the quantity of clusters (i.e., fee count). There are various techniques for defining an outlier, and one or more of such techniques may be used by the training system 350. In some embodiments, the stopping condition is met if and only if both (1) fee average cluster cost satisfies a first threshold (e.g., lower than 1.5) and (2) fee outlier ratio satisfies a second threshold (e.g., no higher than 0.25).

[0156] hi general, the average cluster cost tends to decrease as the value of k (i.e., the count) increases, whereas fee outlier ratio tends to increase as k increases. In some embodiments, when the above stopping condition, considering both factors, is applied, fee lar gest k value feat satisfies fee respective thresholds for both the cluster count and the outlier mtio is fee final k value for fee classifier model 324. 0157] If ihe stopping condition is not met at decision block 1440. then the method 1400 may proceed to block 1445. At block 1445, the training system increases fee count for the next round of clustering. In some embodiments, fee count may be increased incrementally by an amount feat makes it likely Are stopping condition will be satisfied after lies greater Ilian a certain number of iterations of the loop. For instance, in some embodiments, the count can he increased by fee value of a step si ze equal to where ti is fee total number of intents across all the available skill bots 116 and u is the total number of {raining utterances 615 across all fee available skill bots 116. Tills step size will ensure that fee stopping condition is eventually met Specifically, with this step size, starting at a count equal to n, fee twentieth iteration will have a count that is no less than H, the quantity of utterances. When the count is no less than u, each utterance potentially has its own cluster, or if feat is not the case, it is still likely that the average cluster cost will be less than 1.5 and the outlier ratio will be less than or equal to 0.25, which satisfies an example stopping condition. More generally, the step size may’ be chosen to ensure that fee iterations do not waste computing resources by looping for an unreasonably amount of time. After fee count is updated at block 1445, the method 1400 then returns to block 1425 to perform another loop iteration.

|0158] However, if the stopping condition is satisfied at decision block 1440, the method 1400 may exit the current iterative loop and may skip ahead to block 1450. At block 1450, the training system 350 begins the second round of clustering, in which the clusters may be even further defined based c the work done in fee first round.

[0159] hi some embodiments, block 1455 is the beginning of the iterative loop of the second round lor determining clusters. Specifically’, at block 1455, fee training system 3509 may determine respective centroid locations (i.e.. a respective location for each centroid) for the var ious dusters to be generated in this iteration; the quantity of centroid locations is equal to fee current value of the count. In some embodiments, in the first iteration of this loop, the training system 350 uses the centroid locations as recomputed at block 1435 prior to the end of the first round. In iterations other than the first one, fee count has increased since the previous iteration. In that case, fee centroids from file previous iteration may retain their centroid locations, and in some embodiments, centroid locations of centroids being newly added due to an increase in the count may be determined randomly. A respective centroid for a corresponding cluster may be positioned at each centroid location. 10160} At block 1460, Hie training system 350 may determine clusters by assigning each training feature vector 620 to is respective closest centroid from among the various centroids whose locations were determined at block 1455. For instance, for each training feature vector 620, the training system 350 may compute the distance to each centroid location and may assign that training feature vector 620 to the centroi having the smallest such distance. The set of training 'feature vectors 620 assigned to a common centr oid may together form a cluster associated witlr drat centroid; thus, there may be a quantity of clusters equal to the quantity of centroids, which is dre value of the count

[0161] At block 1465, for the clusters determined at block 1460, the training system 350 recomputes the location of each cluster’s centroid. For instance, in some embodiments, the centroid of each cluster is computed to be dre average (e.g , dre arithmetic mean) of the training feature vectors 620 assigned to that centroid and, thus, assigned to feat cluster.

[0162] At decision block 1470, the training system 350 may determine whether a stopping condition is satisfied. In some embodiments, the training system 350 may utiliz tins method 1400 or similar to repeatedly increase tire quantity of centroids (i.e , increase dre count) and assign the darning feature vectors 620 to their closest respect centroids, natii a convergence occurs such that no significant improvement in clustering is likely to occur. In some embodiments, the stopping condition defines the level of convergence that is sufficient.

[0163] hr some embodiments, the stopping condition may be satisfied if one or both of tire following conditions are true: (1) the average cluster cost satisfies a first threshold, such as the average cluster cost being less than 1.5 or some other predetermined value: or (2) the outlier ratio satisfies a second threshold, such as the outlier ratio being less than or equal to 0.25 or some other predetermined value. A cluster cost for a particular cluster may be defined as (a) the sum of the squared distances between the duster’s centroid, as recomputed in block 1465, and each training feature vector 620 assigned to that centroid (b) divided by the quantity of training feature vectors 620 assigned to that centroid. Thus, the average cluster cost may be the average of the various cluster costs of the various centroids. The outlier ratio may be defined as the total number of outliers among the cluster s, divided by the quantity of clusters (i.e., the count). There are various techniques for defining an outlier, and one or more of such techniques may be used by the training system 350. In some embodiments, the stopping condition is me if and only if both (1) the average cluster cost satisfies a first threshold (e.g., lower than 1.5) and (2) the outlier ratio satisfies a second threshold (e.g., no higher than 0.25). [0164] If the s topping condition is not met at decision block 1470, then the method 1400 may procee to block 1475. At block 1475, the training system increases the count for the next round of clustering in some embodiments, the count may be increased incrementally by an amount that makes it likely fire stopping condition will be satisfied after no greater than a certain n mber of Iterations of the loop. For instance, in some embodiments, the count can be increased by the value of a step size equal to sfep 2 = max ^ 2, This step size ensures that the stopping condition will eventually be met. Specifically, with this step size, the stopping condition will likely be met by the end of the fifth iteration. More generally, the step size may be chosen to ensure that the iterations do not wast computing resources by looping for an unreasonably amount of time. Alter the count is updated at block 1475, the method 1400 then returns to block 1455 to perform another loop iteration.

[0165] However, if the stopping condition is satisfied at decision block 1470, the method 1400 may exit the current iterative loop and may skip ahead to block 1480. At block 1480, the training system 350 determines a respective boundary 1010 for each cluster determined above. In some embodiments, the boundary 1010 for a cluster is defined to center on the centroid of the cluster and to include all the training feature vectors 620 assigned to the cluster. In some embodiments, for instance, the boundary 1010 of a cluster is a hypersphere (e.g., a circle or a sphere) having its center at the centroid. In some embodiments, the radius of the boundary 1010 may be a margin value (i.e., a padding amount) pins the larger of (1) the maximum distance from the center to the training feature vector 620, in that cluster, that is farthest from the centroid, or (2) the mean of the respective distances to the centroid from the training feature vectors 620 in the cluster, plus three times the standard deviation of such distances. In other words, the radius may be set to radius — margin + max(max(distanees), mean( , distances ' ) + 3a(distanees ^, where distances is die set of the respective distances from the training feature vectors of the cluster to the centroid of the cluster, and where maxidistances ) is die maximum of that set, mean(distances ) is the mean of that set, and aidistances) is the standard deviation of that set. Further, the margin value, margin, may be a margin of error and may have a value gr eater than or equal to zero.

|0166] hi some embodiments, the mar gin may he used to define a boundary 1010 that encompasses greater coverage than otherwise so as to reduce the chances of a relevant input utterance 303 falling outside all the clusters and thus being labeled as a member of the none class 316. hi other words, the margin may pad the boundary' 1010. For example, the mar gin may have a value of margin = where u is the total number of training utterances 615 being used. The margin can account for a situation in which, potentially beca e the quantity of training «iterances is too low (e.g., twenty to thirty), the training feature Vectors 620 do not cover a significant portion of the feature space 630, such that the clusters might ot e ise he too small to capture relevant input uterances 303.

[0167] Returning to the example of FI©. 15, the training system 350 determines a reSpective boundary for each cluster determined based on assignment of the training feature vectors 620. Specifically, in this example, a first boundary 1010a is determined for the til s† cluster 11 lOg, which includes two training feature vectors 620, and a second boundary 1010b is determined for the second cluster l l lOh which includes one training feature vector 620. The quantity^ of training feature vectors may" be small, as in this example, or may be numerous, such as hundreds of training feature vectors 620 per cluster. It will be understood that this simplistic example is non-limiting and provided for illustrative purposes only.

[0168] As shown in FIG. 14, at block 1485, the training system 350 may configure the classifier model 324 of the master hot 114 to utilize the boundaries 1010, also referred to as duster boundaries, determined at block 1480. For instance, the training system 350 may store an indication of the cluster boundaries in a storage device (e.g., on the digital assistant 106) accessible by the classifier model. Tire classifier model 324 may be configured to compare input utterances to such cluster boundaries 1010, which act as set representations of the training feature vectors 620.

[0169] Various modifications can be made to the above method 1400 and are within die scope of this disclosure. For instance, some examples of the training system 350 perform only a single round of refining the clusters. In that case, the activities of block 1445 to block 1475 may- be skipped such that, when the stopping condition is met at decision block 1440, the method 1400 proceeds to block 1480. Some other examples of the training system 350 perform greater than two rounds of refining the clusters. In that case, the activities of block 1445 to block 1475 may- be repeated for each additional round after the second one. These and other implementations are within the scope of this disclosure.

[0170] lu some embodiments, the k value (i.e., the value of count and thus the number of clusters) determined in the above method 1400 depends on the total number of training uterances 615 as well as the distribution of the training feature vectors 620 throughout the feature space 630. Thus, the k value can vary from one master bot 114 to another, depending oa fee available skill bets 116 and fee training utterances 615 available to represent fee intents of those skill jbots 116. An optimal k value is a value that strikes a balance sueli that each piaster is large enough that input utterances 303 that relate to that cluster will fall within fee boundary 1010 of fee cluster, while utterances feat are unrelated fall outside the boundary 1010. If hie dusters are too large, then fee risk of false matching increases. If the clusters are too small (e.g , consisting of a single training utterance 615), the classifier model 324 may be over-fitted, and fee usefulness of such clusters is limited.

|0171] In fee above example of fee method 1400, fee training feature vector s 620 were not divided or grouped based on intent, and thus a cluster may include training feature vectors 620 representative of various skill hots 116 or various intents of one or more skill bots 116. Additionally or alternatively, an embodiment of the training system 350 may ensure feat each cluster includes only training feature vectors 620 representative of a single skill hot 116, a single intent, or a single sub-bot, where a sub-bot is associated wife a subset of the training utterances representative of a single skill hot 116. Various techniques may be used to limit clusters in this manner. For instance, the training utterances may be separated info groups based on intent, sub-bot, or skill bot 116, and a respective instance of the above method 1400 may be performed on each group hi this manner, a cluster determined during one instance of the method 1400 for a corresponding group (e.g., training utterances 615 representative of a particular skill bot 116) may include only training feature vectors 620 from feat corresponding group, which may be limited to training utterances of a single intent, sub-bot, or skill bot 116. Various other implementations are possible and are within fee scope of this disclosure.

[0172] FIG. 16 is a diagram of a method 1600 of usin a classifier model 324 of a master hot 114 to determine whether an input utterance 303, provided as user input 110, is unrelated to any available skill bot 116 associated with the master bot 114, according to some embodi ments described herein. This method 1600 or similar can be used at block 215 of fee above method 200 of configuring and using a master bot 114 to direct input utterances 303 to skill bots 116. The method 1600 of FIG. 16 is a more specific variation of fee method 700 of FIG. 7, and like that method 700 of FIG. 7, fee method 1600 of FIG. 16 can be used with fee preliminary filtering activity described wife respect to fee method 800 of FIG. 8. More specifically, as described below, fee classifier model 324 of a master bot 114 may utilize clusters of framing feature vectors 620 to determine whether an input utterance 303 belongs to the none class 316 (i.e., is unrelated to the available skill bots 116). OI733 The method 1600 depicted is FIG. 16, as well as other methods described- herein, may be implemented is software (e g., as code instructions, or programs) executed by One or more processing units :(e.g., processors or processor cores), in hardware, or in combinations thereof. The software may be stored on a iion-transitory storage medium, such as on a. memory device. This method 1600 is intended to b illustrative and non-limiting. Although FIG. 16 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting in certain embodiments, forinstance, the activities maybe performed in a different order, or one or more activities of the method 1600 may be performed in parallel. In certain embodiments, the method 1600 may be performed by a master bot 114 associated with a set of available skill bots 116.

[0174] At block 1605 of die method 1600, the master best 114 accesses an input utterance 303 that has provided as user input 110. For instance, in some embodiments, a user may provide user input 110 in the form of speech input, and the digital assistant 106 may convert that user input 110 into a textual input utterance 303 for use by the master bot 114.

|0175] At block 1610, the master bot 114 may generate an input feature vector from the input utterance 303 accessed at block 1605. More specifically, in some embodiments, the master bot 114 causes the classifier model 324 of the master bot 114 to generate the input feature vector from the input utterances 303. The input feature vector may describe and represent tire input utterance 303. Various techniques are known for converting a sequence of words such as an input utterance, into a feature vector, and one or more of such techniques may Ire used. For instance, the training system 350 may, but need not, use a one-hot encoding or some other encoding to encode the input utterance 303 as a corresponding input feature vector. However, an embodiment of tire master bot 114 uses tire same technique as was used to generate training feature vectors 620 from training utterances 615 when training the classifier model 324.

[0176] At decision block 1615, the master bot 114 compares tire input feature vector generated at block 1610 to the clusters. More specifically, in some embodiments, the master bot 114 causes the classifier model 324 to compare the input feature vector generated at block 1610 to the clusters, specifically to the boundaries of the clusters, determined during training. As discussed above, each cluster may include a set of training feature vectors 620 and may include a boundary 1010 based on the training feature vectors 620 of that cluster; for instance, the boundary' 1010 includes ail (mining feature vectors 620 assigned to the cluster and potentially some additional space outside of those training feature vectors 620. Specifically, in some embodiments, the classifier rncsdei 324 determines whether fee input feature vector (i.e., the point corresponding to the input feature vector) fails inside any boundary 1010 of any cluster of fee training feature vectors 620. Various techniques exist in fee art for determinin whether a point falls inside a boundaiy, and one or mor such technique may he used to determine whether the input feature vector tails inside any of fee boundaries 1010 of the clusters.

|b177] At decision block 1620. the classifier model 324 makes a decision base on comparing the input feature vector to the cluster boundaries. If fee input feature vector does not fail inside any cluster boundary, and thus fails outside ail the cluster boundaries 1010. the method 1600 proceeds to block 1625.

[0178] FIG. 17 illustrates an example of executing this method 1600 in a case where an input feature vector 1710 falls outside all the cluster boundaries, according to some embodiments described herein. In some embodiments, the master hot 114 provides an input utterance 303 to the classifier model 324, thus causing the classifier model 324 to convert the input utterance 303 to an input feature vector 1710 and to compare the input feature vector to the cluster boundaries 1010. i fee example of FIG. 17, five clusters 1110 are shown in the feature space 630; however, a greater or fewer number of clusters 1110 may be used. In tins example, the input feature vector 1710 fells outside all the cluster boundaries 1010, and thus, fee classifier model 324 outputs to the master hot 114 an indication that the input utterance 303 belongs to fee none class 316.

[0179] Returning to FIG. 16, at block 1625, the master hot 114 indicates that the input utterance 303 cannot be processed (i.e., cannot be further processed) based on the classifier model 324 indicating that the input feature vector falls outside all the cluster boundaries 1010. For instance, the digital assistant 106 may respond to the user to request clarification or to report that the user input 110 is not relevant to the skills of the digital assistant.

[0180] However, if the input feature vector fails inside one or more cluster boundaries 1010, then the method 1600 skips ahead to block 1630. At block 1630, the master boi 114 determhies a skill hot 116 to handle (i.e., further process and determine a response 435 for) the input utterance 303 by selecting one of the skill bots 116 from among those available. Determining the skill hot 116 can be performed in various ways, and the technique used may depend on the makeup of fee one or more cluster boundaries 1010 into which the input feature vector falls. 018i] To select a skill bot 116 for the input utterance 303, the master hot 114 may consider the various trahnng utterances represented fey the training feature vectors 620 that share a cluster with the input feature vector 1710 (i.e., the training feature Vectors that are members of a cluster 1110 into whose boundary 010 the input feature vector 1710 falls). Forinstance, if the haining feature vector 620 falls into two or more clusters 1110, which are overlapping, then the training utterances 615 having corresponding training feature vectors 1710 in any of those two or more clusters 1110 may he considered- Analogously, if the training feature vector 620 falls into only a single cluster 1110, then the training utterances 615 having corresponding training feature vectors 620 in that cluster 1110 are considered. If all the haining uterances 615 being considered are representative of a single skill bot 116, which may be the case, for instance, if the input feature vector 1710 falls into a cluster 1110 that includes training feature vectors 620 associated with only a single skill bot 116, then the master hot 114 may select that skill bot 116 to handle the input utterance 303.

|0182] hi some embodiments, the classifier model 324 may be able to identify a specific intent of a specific skill bot 116 for the handling of the input utterance. For instance, if the one or more clusters 1110 into which the training feature vector 620 falls include only training feature vectors 620 of training utterances 615 representative of a single intent of a single stall bot 116, then the classifier model 324 may identify that that particular intent is applicable to the input utterance 303. If the classifier model 324 is able to identify a particular intent of a particular skill bot 116, the master bot 114 may route the input utterance 303 to that skill hot 116 and may indicate the intent to the skill bot 116. As a result the skill bot 116 may skip performance of its own classification of the input utterance 303 to infer an intent but, rather, may infer the intent indicated by the master bot 114.

[0183] However, if the training uterances 615 being considered are representative of multiple skill bots 116, which may be the case, for instance, if the input feature vector 1710 falls into a cluster 1110 made up of training feature vectors 620 representative of multiple skill bots 116 or if the input feature vector 1710 falls into multiple overlapping clusters 1110, then the master bot 114 may need to further classify the input utterance 303 to select a skill bot 116. Various techniques may be used to further classify' the input utterance 303. In some embodiments, the classifier model 324 implements a machine learning mode to compute a confidence score (e.g. , using a logistic regression model) with respect to the input utterance 303 for each stall bot 116 having associated training feature vectors 620 in the one or more clusters 1110 into which the input feature vector 1710 falls. For instance, the machine- learning model may be trained with the composite feature vectors. The master hot 114 may then select the skill hot 116 with the highest confidence score to handle the input utterance 303. I contrast to the conventional use of confidence scorns to identify a relevant: skill hot 116, in some embodiments it has already been determined that th input utterances 303 is related to hie skill hots 116; thus, th risk of routing the input utterance 303 to an unrelated skill hot 116 is reduced or eliminated.

[0184] In additional or alternative embodiments, the elassifier model 324 may utilize a k- nearest neighbors technique to select a skill bot 116 from among two or more skill hots 116 whose associated training feature vectors 620 share one or more clusters 1110 with the input feature vector 1710. For instance, the classifier model 324 may select a value of k and may identify the k-nearest training feature vectors 620 to the input feature vector 1710 from among the training feature vectors 620 that fall into the one or more clusters 1110 of hie input feature vector 1710. Tire elassifier model 324 may identify the skill bot 116 that has the greatest number of associated training feature vectors 620 in that set of k-nearest training feature vectors 620. and tire master bot 114 may select that skill bot 116 to handle the input utterance. Various other implementations for selecting a skill bot 116 are possible and are within the scope of this disclosure.

[0185] At lock 1635, the master hot 114 may forward the input utterance 303 to the skill bot selected at block 1630. Tire skill bot 116 may then process the inpu utterance 303 to respond to tire user input 110.

Using Composite Vectors to Identify Unrelated Input Utterances

[0186] As discussed above, generally, some embodiments described herein utilize set representations of training feature vector 620 to determine whether an input uterance 303 is related to a set of available skill hots 116. As also discussed above, tire set representations may be clusters 1110. Additionally or alternatively, however, the set representations can be higher-level feature vectors, referred to herein as composite feature vectors.

[0187] FIG. 18 is a diagram of a method 1800 of initializing a classifier model 324 of a master bot 114 to utilize composite feature vectors to determine whether input utterances 303 are unrelated, or related, to available skill hots 116, according to some embodiments described herein. Ibis method 1800 or similar can be «set! at block 205 of the above method 200 of configuring and using a master bot 114 to direct input utterances 303 to skill bests 116, and further, the method 1800 of FIG. 18 is a more specific variation of the method 500 of FIG. 5.

|0188] Tire method 1800 depicted in FIG. 18. as well as other methods described herein, may be implemented hi software (e.g., as code instructions, or programs) executed by one or more processing units (e g., processors or processor cores), hardware, or in combinations thereof. The software may he stored on a non-transitoiy storage medium, Such as on a memory device. Tins method 1800 is intende to be illustrative an non-limiting. Although FIG. IS depicts various activities occurring in a particular sequence or order this is not intended to be limiting. In certain embodiments, for instance, the activities may be performed in a different order, or one or more activities of the method 1800 may be performed in parallel. In certain embodiments, the method 1800 may be performed by a training system 350, which may be part of a DABP 102.

|0189] At block 1805 of the method 1800, the training system 350 accesses training utterances 615, also referred to as example utterances, for the various skill bots 116 associated with the master bot 114. For instance, die training utterances 615 may be stored as skill bot data in tire training data 354 accessible to the training system 350. In some embodiments, each skill bot 116 available to the master bot 114 may be associated with a subset of the training utterances 615, and each such subset of the training utterances 615 may include training utterances 615 for each intent for which the skill bot 116 is configured. Thus, tiie full set of training utterances 615 may include training utterances 615 for each intent of each skill bot 116, such that every intent of every skill bot 116 is represented. f0190] At block 1810, the training system 350 generates training feature vectors 620 from the training utterances 615 accesse at block 1805. As described above, the training utterances 615 may include a subset associated with each skill hot 116 and, further, may include training utterances 615 for each intent of each skill bot 116. Thus, in some embodiments, the trainin feature vectors 620 may include a respective subset for each skill bot 116 and, further, may- include respective feature vectors for each intent of each skill bot. As described above, the training system 350 may generate a training feature vector 620 to describe and represent each respective training utterance 615. Various techniques are known for converting a sequence of words such as a training utterance 615, into a featur e vector, such as a training feature vector 620, and one or more of such techniques may be used. For instance, the training system 350 may, hot need not, use a one-hot encoding or some other encoding to encode each training utterance 615 as a corresponding training feature vector 620.

[0191] At block 1815, the training system 350 divides the training feature vectors 620. and thus the corresponding training utterance 615, into conversation categories. The conversation categories can he defined, for example based on intent, sub-hot or skill bot 116. In some embodiments, if the conversation categories are based on intent, then each conversation category includes training feature vectors 620 representative of a single corresponding intent for winch a skill hot 116 is configured; in that case, the number of conversation categories may equal the number of intents across the various skill bots 116 available to the master hot 114. In some embodiments, if the conversation categories ar e based on skill bots 116, then eaeh conversation category includes training feature vectors 620 representative of a single corresponding skill hot 116; in that case, the number of conversation categories may equal the number of stall bots 116 available to the master bot 114. In some embodiments, if the conversation categories are based on sub-bots, then each conversation category includes a subset of die training feature vectors 620 representative of a single corresponding skill bot 116: in that case, the number of conversation categories may be no fewer than the number of skill bots 116 available to the master bot 114.

[0192] At block 1820, the training system 350 generates composite feature vectors from the training feature vectors 620, such that the respective training feature vectors 620 in a conversation category are aggregated into a composite feature vector representative and, and corresponding to, that conversation category. In other words, for each conversation category to which training feature vectors 620 were assigned at block 1815, the training system 350 may combine the respective training feature vectors 620 into a composite feature vector for the conversation category. Various techniques can be used for aggregation; for instance, a composite feature vector may be the average (e.g., the arithmetic mean) of the training feature vectors 620 in the respective category. Depending car the basis of the conversation categories, the composite feature vectors may be intent vectors, and sub-bot vectors, or bot vectors according to whether the conversation categories are defined based on intents, sub- bofs, or skill bots 116 respectively.

[0193] Although composite feature vectors can be generated as an arithmetic mean of training feature vectors 620, other mathematical functions, including other types of linear combinations, may be used additionally or alternatively to generate composite featur vectors. For example, in some embodiments, a composite feature vector can be a weighte average of training feature vectors 620 in a conversation category'. The weighting can he based on various factors, uch as priority of certain key words in th corresponding training utterances 615, such that greater weight is given to braining utterances 615 with certain key words. In th case of generating sub-bet vectors or bed vectors, the training feature vectors 620 can be aggregated as a weighted- average such that training feature vectors 620 corresponding to certain intents are given greater weight than other trainin feature vectors 620. For instance, training feature vectors 620 corresponding to an intent having a greater number of representative training utterances 615 may be given greater weight in the aggregate compared to training feature vectors 620 corresponding to an intent having a fewer number of representative training utterances 615. Various implementations are possible and me within the scope of tins disclosure.

[0194] FIG. 19 and FIG. 20 illustrate the concept of composite feature vectors. Specifically, FIG. 19 illustrates the generation of composite feature vectors using intent- based conversation categories, according to some embodiments described herein. As a result, the composite feature vectors shown in FIG. 19 are at the intent level and are thus intent vectors. In the example of FIG. 19, at least two skill hots 116 are available to the master hot 114. A first skill hot 116 is associated with first skill hot data 358g, which includes training utterances 615 representative of a first intent. Intent A, and other training utterances 615 representative of a second intent. Intent B. Tire fir st skill hot 116 is thus configured to handle input utterances associated with Intent A or Intent B. A second stall bot 116 is associated with second skill bot data 358h, which includes training utterances 615 representative of a third intent, Intent C. The second skill bot 116 is thus configured to handle input utterances associated with Intent C.

[0195] In the example of FIG. 19, the training system 350 converts all the training utterances 615 for the skill hots 116 to respective feature vectors 620. The training system 350 groups the training feature vectors into intent-based conversation categories. As such, the trainin feature vectors 620 corresponding to the training utterances 615 for Intent A form a first conversation category, the training feature vectors 620 corresponding to the training utterances 615 for Intent B form a second conversation category·, and the training feature vectors 620 corresponding to the training utterances 615 for Intent C form a third conversation category. In this example, as described above, the training feature vectors 620 of a given intent-based conversation category are aggregated (e.g. , averaged) into a composite feature vector for the conversation category and, thus, for the associated intent. Specifically, the training system 350 aggregates the training feature vectors 620 corresponding to the training «iterances 615 for Intent A into a first composite feature vector 1910a, the training system 350 aggregates the training feature vectors 620 corresponding to the training utterances 615 for Intent B hito a second composite feature vector 1910b, and the training system 350 aggregates fee training feature vectors 620 correspondin to fee trahnng utterances 615 for Intent € into a third composi te feature vector 1910c. Hms, in this example, each composite feature vector is representative of a respective intent of a skill hot 116.

[0196] FIG. 20 illustrates fee generation of composite feature vectors using skill-bot-based, also referred to as hot-based, conversation categories, according to some embodiments described herein. As a result, fee composite feature vectors shown in FIG. 20 are at the skill hot level and are feus hot vectors. In the example of FIG. 20, at least two skill hots 116 are available to the master hot 114. A first skill hot 116 is associated with first skill hot data 358g, winch includes training utterances 615 representative of a first intent, Intent A, and other training utterances 615 representative of a second intent. Intent B. The first skill boi 116 Is feus configured to handle input utterances associated wife Intent A or Intent B. A second skill hot 116 is associated with second skill hot data 358k, which includes training utterances 615 representative of a third intent, Intent C. The second skill hot 116 is feus configured to handle input utterances associated wife Intent C.

[0197] hi the example of FIG. 20, the training system 350 converts all the training utterances 615 for the skill hots 116 to respective feature vectors 620. The training system 350 groups the training feature vectors into skili-hot-based conversation categories. As such, the training feature vectors 620 corresponding to fee training utterances 615 for Intent A along with the training feature vectors 620 corresponding to the training utterances 615 for Intent B form a first conversation category, and fee training feature vectors 620 corresponding to the training utterances 615 for Intent C form a second conversation category. In this example, as described above, the training feature vectors 620 of a given skill-bot-based conversation category are aggregated (e.g., averaged) into a composite feature vector for the conversation category and, thus, for the associated skill hot 116. Specifically, the training system 350 aggregates fee training feature vectors 620 coiresponding to the training utterances 615 for Intent A and fee training feature vectors 620 corresponding to the training utterances 615 for Intent B into a first composite feature vector 191 Od, and the training system 350 aggregates the training feature vectors 620 corresponding to the training utterances 615 for Intent C into a second composite feature vector 1910e. Thus, in tins example, each composite feature vector is representative of a respective skill bot 116.

|0198] In some embodiments, the training system 350 is not limited to generating only one type of composite feature vector. For instance, the training system 350 ma generate intent vectors , sub-bot vectors, an skill bot vectors , or the training system 350 may generate some other combination of these or other types of composite feature vectors.

J0199] Returning to FIG. 18, at block 1825, the training system 350 may configure the classifier model 324 of the master bot 114 to utilize the composite feature vectors determined at block 1820. For instance, the training system 350 may store an indication of the composite feature vectors in a storage device (e.g., on the digital assistant 106) accessible by the classifier model 324. The classifier model 324 may be configured to compare input utterances to such composite feature vectors, winch act as set representations of the training feature vectors 620.

|0200] FIG. 21 is a diagram of a method 2100 of using a classifier model 324 of a master hot 114 to determine whether an input utterance 303, provided as user input 110, is unrelated to any available skill bot 116 associated with the master bot 114, according to some embodiments described herein. Tins method 2100 or similar can be used at block 215 of the above method 200 of configuring and using a master hot 114 to direct input utter ances 303 to skill bots 116. The method 2100 of FIG. 21 is a more specific variation of the method 700 of FIG. 7, and like that method 700 of FIG. 7, the method 2100 of FIG. 21 can be used with the preliminary filtering activity described with respect to the method 800 of FIG. 8. More specifically, as described below, the classifier model 324 of a master hot 114 may utilize composite feature vectors to determine whether an input utterance 303 belongs to the none class 316 (i.e., is unrelated to the available skill bots 116).

[0201] The method 2100 depicted in FIG. 21, as well as other methods described herein, may be implemented in software (e g. , as code, instructions, or programs) executed by one or more processing units (e.g., processors or processor cores), in hardware, or in combinations thereof. The software may- be stored on a non-transitory storage medium, such as on a memory device. This method 2100 is intended to he illustrative and non-limiting. Although FIG. 21 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting. In certain embodiments, feu instance, the activities may be performed in a different order, or one or more activities of the method 2100 may be performed in parallel In certain embodiments, the method 2100 may b performed by a master hot 114 associated with a set of available skill bots 116.

[0202] At Mock 2105 of the method 2100, &e maste bot 114 accesses an input utterance 303 dial has provided as use input 110. For instance, ia some embodiments, a user may provid user input 110 ia the form of speech input, and the digital assistant 106 may convert that user input 1Ί0 into a textual input utterance 303 for use by the master bot 114.

|0203] At block 2110, the master bot 114 may generate an input feature vector 1710 from the input utterance 303 accessed at block 2105. More specifically, hi some embodiments, the master bot 114 causes the classifier model 324 of the master bot 114 to generate the input feature vector 1710 from the input utterances 303. The input feature vector 1710 may describe and represent the input utterance 303. Various techniques are known for converting a sequence of words, such as an input utterance 303, into a feature vector, and care or more of such techniques may be used. For instance, the training system 350 may, but need not, use a one-hot encoding or some other encoding to encode the input utterance 303 as a corresponding input feature vector 1710. However, an embodiment of the master bot 114 uses the same technique as was used to generate trainin feature vectors 620 from training utterances 615 when training the classifier model 324.

[0204] At block 2115, the master bot 114 compares the input feature vector 1710 generated at block 2110 to the composite feature vectors. More specifically, in some embodiments, the master bot 114 causes the classifier model 324 to compare tire input feature vector 1710 determined at block 2110 to the composite feature vectors, which may be determined as described above with respect to FIG. 18. In comparing the input feature vector 1710 to the composite feature vectors, the classifier model 324 may determine a similarity, or a distance, between the input feature vector 1710 and each composite feature vector previously constructed for tire classifier model 324.

[0205] The similarity, or distance, between the input feature vector 1710 and another feature vector, specifically a composite feature vector, can be calculated in various ways. For example, to determine the similarity, the classifier model 324 may compute the absolute value of the arithmetic difference between the composite feature vector and the input feature vector 1710 (i.e., the Euclidean distance) by subtracting one from the other and takin the absolute value. For another example, the classifier model 324 may multiple the input feature vector 1710 with the composite feature vector; for instance, if a one-hot encoding is used for the input feature vector 1710 and for the composite feature vector, each vector entry (i;e„, each dimension) will ha ve a value of 1 or 0. where a value of 1 may represent the presence of a cert am feature. If the input feature vector 1710 and the composite feature vector have mostly the same features then the result of the vector-vector multiplication would be a vec tor baying approximately the same number of 1 values as either of tire two vectors being multiplied. Otherwise, the resulting vector would be mostly 0s. For another example, a cosine similarity can be used; for instance, the cosine similarity between the input feature vector 1710 and the composite feature vector can be calculated as the dot product of the two vectors divided by the product of the Euclidean norm of both vectors. Various other techniques for measuring similarity are possible and are within the scope of this disclosure.

[0206] At decision block 2120, the classifier model 324 of the master hot 114 determines whether the input feature vector 1710 is sufficiently similar to any of the composite feature vectors. The classifier model 324 may use a predetermined threshold, such that a similarity is deemed sufficient if it meets the threshold. If the similarity metric used provides a small value when two vectors are similar, as in the case of a distance metric, then the threshold may be an upper threshold such that the input feature vector 1710 is sufficiently close to a composite feature vector, for instance, if the similarity is no greater than the threshold. However, if the similarity metric used provides a large value when two vectors are similar, then the threshold may be a lower threshold such that the input feature vector 1710 is sufficiently close to a composite feature vector, for instance, if the similarity is no less than the threshold.

|0207] In some embodiments, the determination of sufficient similarity may be determined using a hierarchy of composite feature vectors. For instance, the input feature vector 1710 may be compared to bot vectors and then. If needed, to sub-bot vectors and then, if needed, to intent vectors until a sufficient similarity 7 is found or until it is determined that he input feature vector 1710 is not sufficiently 7 similar to any such composite feature vector. Because there are fewer bot vectors than intent vectors, and fewer intent vectors than sub-bot vectors, the computing resources needed for determining similarities to bot vectors may he less than the computing resources needed for determining similarities to sub-bot vectors, which may 7 be less than the computing resourced needed for determining similarities to intent vectors. Tlius, by comparing the input feature vector 1710 to composite feature vectors at different levels, starting at the highest level available (e.g. , the skill hot level), the classifier model 324 can determine which skill bot 116, if any, is best suited to handle the input utterance 303 while using less ccunpiitaiionally intensive processing before resorting to more ccsnrpuiationally intensive processing.

[0208] In some embodiments, die input utterance 303 may be deemed unrelated to any of lire available skill bots 116 if Hie inpnt feature vector 1710 is far enough away (e.g , exceeding a predetermined distance deemed to be excessive) from even,· bot vector. However, in additional or alternative embodiments, an input utterance 303 is not deemed unrelated based on dissimilarity to hot vectors alone. For instance, if the inpnt feature vector 1710 is dissimilar· (i.e., not sufficiently similar) to all the bot vectors, this is not necessarily a dispositive indication that the input utterance 303 belongs in the none class 316. There may be situations where the input feature vector 1710 is far from (i.e., dissimilar to) any bot vector but still close to an intent vector, due to the corresponding intent being dissimilar to oilier intents for which the skill bot 116 is configured. In some embodiments, comparing the input feature vector 1710 to both bot vectors and intent vectors is typically sufficient to identify unrelated input «iterances 303. Thus, the input utterance 303 may be deemed unrelated to the available skill bots 116 if the input featur e vector 1710 is dissimilar to all bot vectors as well as dissimilar to all intent vectors. In some embodiments, the input feature vector 1710 may additionally or alternatively be compared to sub-bot vectors or other composite feature vectors as part of determining whether the input utter ance 303 is a member of the none class. Various implementations are possible and ar e within the scope of tins disclosure.

[0209] If the classifier model 324 determines that the input feature vector 1710 is not sufficiently similar to any composite feature vector at decision block 2120, based on whatever comparisons are performed, the method 2100 proceeds to block 2125. At block 2125, master bot 114 determines that the input utterance is unrelated to any available skill bot 116, and thus, the master bot 114 may indicate that the input utterance 303 cannot be processed. As a result, for example, the digital assistant may ask the user to clarity the user input 110.

[0210] However, if the classifier model 324 determines that the input feature vector 1710 is sufficiently similar to one or more composite feature vectors at decision block 2120. then the method 2100 skips ahead to block 2130. At block 2130, based on the comparisons with the composite feature vectors, the master bot 114 determines (i.e., selects) a skill bot 116 to handle the input utterance 303. For example, the master bot 114 may select the skill bot 116 related to the composite feature vector deemed most similar to the input feature vector 1710. For instance, if the most similar composite featur e vector is an intent vector, then the master bot 114 may select the skill bot 116 configured to handle the intent corresponding to the intent vector; an if the most similar composite feature vector is a snb-hof vector or a hot vector, then the master hot 114 may select the skill hot 116 to which the sub-foot vector or hot vecior corresponds. For another example, the master bot 114 may use a k-nemest neighbors technique to select the skill hoi 116. as will be described further below.

[0211] At block 2135, the master bot 114 may mete the input utterance to the skill bot 116 selected at block 2130. In some embodiments, if the master hot 114 identified a particular intent for the input utterance (e.g., if the input feature vector 1710 was deemed sufficiently close to a single intent vector), the master bot 114 may indicate that intent to the skill bot 116, thus enabling the skill bot 116 to skip the process of inferring an intent. The skill bot 116 may then process the input utterance 303 to respond to tire user input 110.

[0212] FIG. 22 is a diagr am of a method 2200 of selecting a skill bot 116 to handle an input utterance, according to some embodiments described herein. This method 2200 or similar may be used at block 2130 of the above method 2100, after determination that the input utterance 303 is related to at least one available skill bot 116. This method 2200 provides a k- nearest neighbors approach to selecting a skill bot 116, but other techniques may be used additionally or alternatively to this approach. Specifically, this method 2200 utilizes the k- nearest training feature vector s 620 to the input feature vector 1710 to determine which skill bot 116 to select.

[0213] Tire method 2200 depicted in FIG. 22, as well as other methods described herein, may be implemented in softwar e (e.g., as code, instructions, or programs) executed by one or more processing units (e.g. , processor s or processor cores), in hardware, or in combinations thereof The software may be stored on a non-transitory storage medium, such as on a memory device. This method 2200 is intended to be illustrative and non-limiting. Although FIG. 22 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting. In certain embodiments, for instance, the activities may be performed in a different order, or one or more activities of the method 2200 may be performed in parallel. In certain embodiments, the method 2200 may be performed by a master bot 114.

[0214] As shown in FIG. 22, at block 2205, the master bot 114 determines a value for k, where k is the number of neighbors that will be considered. In some embodiments, tire value of k may be a factor of the festal number of training feature vectors 620 (i.e., the total number of training utterances 615) 10215} At block 2210, using fee value of k determine at block 2205, fee master boi 114 determines a set of k training feature vectors 620 feat are fee closest (i.e., the most similar) to fee input feature vecto 1710. The master hot 114 may use fee same, or a different similarity metric · as fee one used above when determining whether fee input feature vector 1710 was sufficiently similar to any composite feature vectors. For example, fee similarity metric used may be Euclidean distance, vector multiplication, or cosine similarity.

[0216] At block 2215, fee master boi 114 selects fee skill best 116 having fee most training feature vectors 620 in fee set determined at block 2210. In some embodiments, it is not necessary feat training feature vectors 620 of fee skill bot 116 make up fee majority of fee set but, rather, only that no other skill bot 116 has a greater quantity of training feature vectors 620 in the set. As described above after selection of this stall bot 116, fee master boi 114 may- route fee input utterance 303 to fee selected skill bot 116 for processing.

[0217] Additionally or alternatively to considering fee nearest training feature vectors 620 to fee input feahne vector 1710, some embodiments of a master bot 114 may consider the nearest composite feature vectors, such as fee nearest intent vectors. FIG. 23 is a diagram of another example method 2300 of selecting a skill bot 116 to handle an input utterance, according to some embodiments described herein. This method 2300 or similar may be used at block 2130 of fee above method 2100, alter determination that the input utterance 303 is related to at least one available skill bot 116. This method 2300 provides a k-nearest neighbors approach to selecting a skill bot 116, but other techniques may be used additionally or alternatively to this approach. Specifically, in contrast to the method 2200 of FIG. 22, this method 2300 utilizes fee k-nearest intent vector s to the input feature vector 1710 to determine winch skill bot 116 to select.

[0218] The method 2300 depicted in FIG. 23, as well as other methods described herein, may be implemented in software (e g. , as code, instructions, or programs) executed by one or more processing units (e.g., processors or processor cores), in hardware, or in combinations thereof. The software may- be stored on a iron-transitory storage medium, such as on a memory device. Tins method 2300 is intended to be illustrative and non-limiting. Although FIG. 23 depicts various activities occurring in a particular sequence or order, this is not intended to be limiting. In certain embodiments, for instance, the activities may be performed in a different order, or one or more activities of the method 2300 may- be performed in parallel hi certain embodiments, the method 2300 may be performed by a master bot 114. 0219} As shown is FIG. 23, at block 2305, the master bot 114 determines a value for k, where k is the number of neighbors that will be considered. In some embodiments, the value of k may be a factor of the total number of intent vectors. The total number of intent vectors may he equal to the total number of intents represented in the training utterances 615 ai may be equal to the total number of intents that the available skill hots 116 are configured to handle, and the value- <if k may he selected as a factor of that quantity.

[0229] At block 2310, usin the value of k determined at block 2305, the master hot 114 determines a set of k intent vectors that are the closest (i.e., the most similar) to the input feature vector 1710. The master hot 114 may use the same, or a different, similarity metric as the one used above when determining whether the input feature vector 1710 was sufficiently similar to any composite feature vectors in the above method 2100. For example, the similarity metric used may be Euclidean distance, vector multiplication, or cosine similarity.

[0221] At block 2315, the master hot 114 selects the skill hot 116 having the most intent vectors in the set determined at block 2310. In some embodiments, it is not necessary that intent vectors of the skill hot 116 make up the majority of the set but, rather, only that no other skill hot 116 has a greater quantity' of intent vectors in the set. As described above, after selection of this skill hot 116, the master bot 114 may' route the input utterance 303 to the selected skill hot 116 for processing.

Example Implementation

[0222] FIG. 24 depicts a simplified diagram of a distributed system 2400 for implementing an embodiment. In the illustrated embodiment, distributed system 2400 includes one or more client computing devices 2402, 2404, 2406, and 2408, coupled to a server 241 via one or more communication networks 2410. Clients computing devices 2402, 2404, 2406, and 2408 may-' be configured to execute one or more applications

[0223] In various embodiments, server 2412 may be adapted to run one or more services or software applications that enable the processing described in this disclosure.

[0224] In certain embodiments, server 2412 may also provide other service or software applications that can include non-virtual and virtual environments hi some embodiments, these services may be offered as web-based or cloud services, such as under a Software as a Service (SaaS) model to the risers of client computing devices 2402, 2404, 2406, and/or 2408. Users operating client computing devices 2402, 2404, 2406, and/or 2408 may in turn utilize one or more client applications to interact with server 2412 to utilize re servicesprovided by these components. f022S] In tie configuration depicted in PIG. 24, server 2412 ay include one or more components 2418, 2420 and 2422 that implement the functions performed by server 2412. These components may include software components that may he execute b one or more processors, hardware components, or combinations thereof. It should be appreciated that various different system configurations are possible, which may be different from distributed system 2400. The embodiment shown in FIG. 24 is thus one example of a distributed system for implementing an embodiment system and is not intended to be limiting.

[0226] Users may use client computing devices 2402, 2404, 2406, and/or 2408 to interact with server 2412 in accordance with the teachings of this disclosure. A client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via this interface. Although FIG. 24 depicts only four client computing devices, any number of client computing devices may be supported.

[0227] The client devices may include various types of computing systems such as portable handheld devices, general purpose computers such as personal computers and laptops, workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computing devices may run various types and versions of software applications and operating systems (e.g., Microsoft Windows®, Apple Macintosh®, UNIX® or UNIX-like operating systems, Linux or Linux- like operating systems such as Google Chrome™ OS) including various mobile operating systems (e.g., Microsoft Windows Mobile®, iOS®, Windows Phone®, Android™, BiackBeny®, Palm OS®). Portable handheld devices may include cellular phones, smartphones, (e.g., an iPhone®), tablets (e.g., iPad®), personal digital assistants (PDAs), and the like. Wearable devices may include Google Glass® head mounted display, and other devices. Gaming systems may include various handheld gaming devices, Internet-enabled gaming devices (e.g., a Microsoft Xbox® gaming console with or without a Kinect® gesture input device, Sony PlayStation® system, various gaming systems provided by Nintendo®, and others), and the like. The client devices may be capable of executing various different applications such as various Internet-related apps, communication applications (e.g.. E-mail applications, short message service (SMS) applications ' ) and may use various communication protocols. 0228j Network(s) 2 10 may be any type of aeftvoi familial to those skilled in Hie ml that can support data communications using any of a variety of available protocols, including without limitation TCP/IP (transmission control protocol/Intemet protocol), SNA (systems networ architecture), IPX (Internet packet exchange), AppleTalk®, and die like. Merely by way of example, networks) 2410 can b a local area network (LAN), networks based on Ethernet, Token-Ping, a wide-area network (WAN), the Internet, a virtual network, a virtual private network (VPN), an intranet, an extranet, a public switched telephone network (PSTN), an infra-red network, a wireless network (e.g.. a network operating under any of the Institute of Electrical and Electronics (IEEE) 802.11 suite of protocols, Bluetooth®, and/or any other wir eless protocol), and/or any combination of these and/or other networks.

[0229] Server 2412 may be composed of one or more general purpose computers, specialized server computers (including, by way of example, PC (personal computer) servers, UNIX® servers, mid-range servers, mainframe computers, rack-mounted servers, etc.), server farms, server clusters, or any other appropriate arrangement and/or combination. Server 2412 can include one or more virtual machines running virtual operating systems, or other computing ar chitectures involving virtualization such as one or more flexible pools of logical storage devices that can be virtualized to maintain virtual storage devices for the server. In various embodiments, server 2412 may be adapted to run one or more services or softwar e applications that provide the functionality described in the foregoing disclosure.

[0230] Tire computing systems in server 2412 may run one or more operating systems including any of those discussed above, as well as any commercially available server operating system. Server 2412 may also ran any of a variety' of additional server applications and/or mid-tier applications, including HTTP (hypertext transport protocol) servers, FTP (file transfer protocol) servers, CGI (common gateway interface) servers, JAVA® se ers, database servers, and the like. Exemplary' database servers include without limitation those commercially available from Oracle®, Microsoft®. Sybase® 1 , IBM® 1 (International Business Machines), and the like.

[0231] In some implementations, server 2412 may' include one or more applications to analyze and consolidate data feeds and/or event updates received from risers of client computing devices 2402, 2404, 2406, and 2408. As an example, data feeds and/or event updates may' include, but are not limited to, Twitter® feeds, Facebook® updates or real-time updates received fr om one or more third party information sources and continuous data streams, which may include real-time events related to sensor data applications, financial tickers, network performance measuring fools (e.g., network monitoring and traffic management applications) clickstream analysis tools, automobile traffic monitoring and the like. Server 2412 may also include one or more applications to display the data feeds and or real-time events via one or more display devices of client computing devices 2402, 2404, 2406, and 2408.

|0232] Distributed system 2400 may also include one or more data repositories 2414, 2416. These data repositories may be used to store data and other information in certain embodiments. For example one or more of the data repositories 2414, 2416 may be used to store data or information generated by the processing described herein and/or data or information used for the processing described herein. Data repositories 2414, 2416 may reside in a variety of locations. For example, a data repository used by server 2412 may be local to server 2412 or may be remote from server 2412 and in communication with server 2412 via a network-based or dedicated connection. Data repositories 2414, 2416 may be of different types. In certain embodiments, a data repository’ used by server 2412 may be a database, for example, a relational database, such as databases provided by Oracle Corporation® and other vendors. One or more of these databases may be adapted to enable storage, update, and retrieval of data to and from the database in response to SQL-formatted commands

[0233] hi certain embodiments, one or more of data repositories 2414. 2416 may also be used by applications to store application data. The data repositories used by applications may he of different types such as, for example, a key-value store repository, an object store repository, or a general storage repository supported by a file system.

[0234] In certain embodiments, the functionalities described in this disclosure may be offered as services via a cloud environment. FIG. 25 is a simplified block diagr am of a cloud- based system environment in which functionalities described herein may be offered as cloud services, in accordance with certain embodiments. In the embodiment depicted in FIG. 25, cloud infrastructure system 2502 may provide one or more cloud services that may be requested by users using one or more client computing devices 2504, 2506, and 2508. Cloud infrastructure system 2502 may comprise one or more computers and/or servers that may include those described above for server 2412. The computers in cloud infrastructure system 2502 may be organized as general purpose computers, specialized server computers, server farms, server clusters, or any other appropriate arrangement and/or combination. |0235} etworl (s) 2510 may facilitate communication sad exchange of data between clients 2504, 2506, and 2508 an cloud infrastructure system 2502. Network(s) 2510 may include one or more networks Hie networks may be of the same or different types. Netwoik(s) 2510 may support one or more communication protocols. including wired and/or wireless protocols, for facilitating the communications.

[0236] The embodiment depicted ia FIG. 25 is only one example of a elomimfiastmcture system and Is not intended to be limiting. It should be appreciated that, in some other embodiments, cloud infrastructure system 2502 may have more or fewer components than those depicted in FIG. 25, may combine two or more components, or may have a different configuration or arrangement of components. For example, although FIG. 25 depicts three client computing devices, any number of clien computing devices may be supported In alternative embodiments.

[0237] Tiie term cloud service is generally used to refer to a sendee that is made .available to users on demand and via a communication network such as the Internet by systems (e.g., cloud infrastructure system 2502) of a service provider. Typically, in a public cloud environment, servers and systems that make up die cloud service provider’s system are different from the customer’s own on-premise servers and systems. The cloud service provider’s systems are managed by the cloud service provider. Customers can thus avail themselves of cloud sendees provided by a clou sendee provider without having to purchase separate licenses, support, or hardware and software resources fa the se dees. For example, a cloud sendee provider's system may host an application, and a user may, via the Internet, on demand, order and use the application without the user having to buy infrastructure resources for executing the application. Cloud services are designed to provide easy, scalable access to applications, resources and sendees. Several providers oiler cloud sendees. For example, several cloud sendees are offered by Oracle Corporation® of Redwood Shores, California, such as middlewar e services, database sendees. Java cloud services, an others,

[0238] hi certain embodiments, cloud infrastructure system 2502 may provide one or more cloud sendees using different models such as under a Software as a Sendee (SaaS) model, a Platform as a Sendee (PaaS) model, air Infrastructure as a Service (laaS) model, and others, including hybrid service models. Cloud infrastructure system 2502 may include a suite of applications, middleware, databases, and oilier resources that enable provision of the various cloud services. 0239j A SaaS model enables an application or software to be delivered to a customer over a communication network like the Internet, as a service, without the customer ha ing to boy the hardware or software for the underlying application. For example, a SaaS model may be use to provide customers access to on-demand applications that are hosted by clou mfiasbuetnre system 2502. Examples of SaaS sendees provided by Oracle Corporation® include, without limitation, various services for human resourees/capftal management, customer relationship management(CRM), enterprise resoince plaiHiing (ERF), supply chain management (SCM), enterprise performance management (EPM), analytics sendees, social applications, and others.

}0240] An laaS model is generally used to provide infrastructure resources (e.g. , servers, storage, hardware and networking resources) to a customer as a cloud service to provide elastic compute and storage capabilities. Various IaaS services are provided by Oracle Corporation®.

{0241] A PaaS model is generally used to provide, as a service, platform and environment resources that enable customers to develop, run, and manage applications and services without the customer having to procure, build, or maintain such resources. Examples of PaaS sendees provided by Oracle Corporation® include, without limitation, Oracle Java Cloud Sendee (JCS), Oracle Database Cloud Sendee (DBCS), data management cloud sendee, various application development solutions services, and others.

{0242] Cloud sendees are generally provided on an on-demand self-service basis, subscription-based, elastically scalable, reliable, highly available, and secure manner. For example, a customer, via a subscription order, may order one or more services provided by cloud infrastructure system 2502. Cloud infrastructure system 2502 then performs processing to provide the services requested in the customer’s subscription order. For example in certain embodiments, the chatbots-related functions described herein may be provided as cloud services that are subscribed to by a user/subscriber. Cloud infrastructure system 2502 may be configured to provide one or even multiple cloud services.

{0243] Cloud infrastructure system 2502 may provide the cloud services via different deployment models hi a public cloud model, cloud infrastructure system 2502 may be owned by a tbbd patty cloud services provider an the cloud services are offered to any general public customer, where the customer can be an individual or an enterprise. In certain other embodiments, under a private cloud model, cloud infrastructure system 2502 may be operated within an organization (e.g., within an enterprise organization) aid sendees provided to customers feat are within the organization. For example, the customers may lie various departments of an enterprise sneh as the Human Resources department, the Payroll department, etc. or even individuals within the enterprise. Si certain other embodiments, under a community cloud model, fee cloud infrastructure system 2502 and the sendees provided may be shared by several organizations in a related community. Various other models such as hybrids of tlie above mentione models may also heused.

[0244] Client computing devices 2504, 2506, and 2508 may be of different types (such as devices 2402, 2404, 2406, and 2408 depicted in FIG. 24) and may be capable of operating one or more client applications. A user may use a client device to interact with cloud infrastructure system 2502, such as to request a service prowled by cloud infrastructure system 2502. For example, a user may use a client device to request a chatbot-related service described in this disclosure.

[0245] hi some embodiments, the processing performed by cloud infrastructure system 2502 may involve big data analysis. This analysis may involve using, analyzing and manipulating lar ge data sets to detect and visualize various trends, behaviors, relationships, etc. within the data. This analysis may be performed by one or more processors, possibly processing the data in par allel, performing simulations using the data, and the like. The data used for this analysis may include structured data (e,g., data stored in a database or structured according to a structured model) anchor unstructured data (e.g., data blobs (binary large objects)).

[0246] As depicted in the embodiment in FIG. 25, cloud infrastructure system 2502 may include infrastructure resources 2530 that are utilized for facilitating the provision of various cloud services offered by cloud infrastructure system 2502. Infrastructure resources 2530 may include, for example, processing resources, storage or memory resources, networking resources, and the like.

[0247] hi certain embodiments, to facilitate efficient provisioning of these resources for supporting fee various cloud services provided by cloud infrastructure system 2502 for different customers, the resources may be bundled into sets of resources or resource modules (also referred to as “pods”). Each resource module or pod may comprise a pie-integrated and optimized combination of resources of one or more types. In certain embodiments, different pods may be pre-provisioned for different types of cloud services. For example, a first set of pods may be provisioned for a database service, a second set of pods, which may include a different combination of resources than a pod ifi the first set of pods may be provisioned for Java service, and fee like. For some services, the resources allocated for provisioning the services may be shared between the sendees.

[0248] Cloud infrastructure system 2502 may itself internally use services 2532 that are shared by different components of cloud infrastructure syste 2502 and feat facilitate : the provisioning of sendees by clond infrastructure system 2582. These internal shared services may include, without limitation, a security and identity service, an integration sendee, an enterprise repository sendee, an enterprise manager service, a vims scanning and white list sendee, a high availability', backup and recovery sendee, sendee for enabling cloud support, an email service, a notification sendee, a file transfer sendee, and the like.

[0249] Cloud infrastructure system 2502 may comprise multiple subsystems. These subsystems may be implemented in software, or hardware, or combinations thereof. As depicted in FIG. 25, fee subsystems may include a user interface subsystem 2512 that enables users or customer s of cloud infrastructure system 2502 to interact with cloud infrastructure system 2502. User interlace subsystem 2512 may- include various different interfaces such as a web interface 2514, an online store interface 2516 where cloud sendees provided by cloud infrastructure system 2502 are advertised and are purchasable by a consumer, and other interfaces 2518. For example, a customer may, using a client device, request (service request 2534) one or more services provided by cloud infrastructure system 2502 usin one or more of interfaces 2514, 2516, and 2518. For example, a customer may’ access fee online store, browse cloud sendees offered by cloud infrastructure system 2502, and place a subscription order for one or more services offered by cloud infrastructure system 2502 that the customer wishes to subscribe to. The sendee request may include information identifying the customer and one or more sendees feat the customer desires to subscribe to.

[0250] hi certain embodiments, such as the embodiment depicted in FIG. 25, clou infrastructure system 2502 may comprise an order management subsystem (OMS) 2520 that is configured to process the new order. As part of this processing, OMS 2520 may lie configured to: create an account for the customer, if not done already: receive billing and/or accounting information from the customer dial is to be used for billing the customer for prowling fee requeste service to fee customer; verify' fee customer information: upon verification, book fee order for fee customer; and orchestrate various workflows to prepare fee order for provisioning. |0251] Once properly validated, OMS 2520 may then invoice the order provisioning subsystem (OPS) 2524 that is configured to provision resources fo the order including processing, memory, and networking resources Ihe provisioning may include allocating resources for tire order and configuring tire resources to facilitate the service requested by the customer order. Tire manner in which resources are provisioned for an order and the typ of the provisionedresources may depend upon the type of cloud service that has been ordered by the customer. For example, according to one workflow, OPS 2524 may be configure to determine die particular cloud service being requested and identify a number of pods that may have been pre-eonfigured for that particular cloud sendee. The number of pods that are allocated for an order may depend upon the size/amount/level/scope of the requested sendee. For example, the number of pods to be allocated may be determined based upon the number of users to be supported by the service, the duration of time for which tire sendee is being requested, and the like. Hie allocated pods may then be customized for the particular requesting customer for providing the requested sendee.

[0252] Cloud infrastructure system 2502 may send a response or notification 2544 to the requesting customer to indicate when the requested sendee is now ready for use. In some instances, information (e.g., a link) may be sent to the customer that enables the customer to start using and availing the benefits of the requested services.

[0253] Cloud infrastructure system 2502 may provide sendees to multiple customers. For each customer, cloud infrastructure system 2502 is responsible for managing information related to one or more subscription orders received from the customer, maintaining customer data related to the orders, and providing the requested services to the customer. Cloud infrastructure system 2502 may also collect usage statistics regarding a customer's use of subscribed services. For example, statistics may Ire collected for the amount of storage used, the amount of data transferred, the number of users, and the amount of system up time and system down time, and the like. This usage information may be used to bill the customer. Billing may be done, for example, on a monthly cycle.

[0254] Cloud infrastructure system 2502 may provide sendees to multiple customers in parallel. Cloud infrastructure system 2502 may" store information for these customers, including possibly proprietary information. In certain embodiments, cloud infrastructure system 2502 comprises an identity management subsystem (IMS) 2528 that is configured to manage customers information and provide the separation of the managed information such that information relate to one customer is not accessible by another customer. IMS 2528 may be configured to provide various security-related services such as identity- sendees, such as information access management authentication and authorization services, services lor managing customer identities ami roles and related capabilities, and the like.

[§255] FIG. 26 illustrates an exemplary computer system 2600 that may be used to implement certain embodiments. For example, in some embodiments, computer system 2600 may be used to implement any of fire system and subsystems of a chatbot system, and various servers and computer systems described above. As shown in FIG. 26, computer system 2600 includes various subsystems including a processing subsystem 2604 that communicates with a number of other subsystems via a bus subsystem 2602. These other subsystems may include a processing acceleration unit 2606, an I/O subsystem 2608, a storage subsystem 261 S, and a communications subsystem 2624. Storage subsystem 2618 may- include noil-transitory computer-readable storage media including storage media 2622 and a system memory 2610.

[0256] Bus subsystem 2602 provides a mechanism for letting the various components and subsystems of computer system 2600 communicate with each other as intended. Although bus subsystem 2602 is shown schematically ' as a single bus, alternative embodiments of the bus subsystem may utilize multiple buses. Bus subsystem 2602 ma ' be any- of several types of bus structures including a memory bus or memory controller, a peripheral bus, a local bus using any of a variety’ of bus architectures, and the like. For example, such architectures may include an Industry Standard Architecture (ISA) bus. Micro Channel Architecture (MCA) bus. Enhanced ISA (EISA) bus. Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, which can he implemented as a Mezzanine bus manufactured to the IEEE P1386.1 standard, and the like.

|0257] Processing subsystem 2604 controls hie operation of computer system 2600 and may-' comprise one or more processors, application specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs). The processors may Include be single core or multicore processors. The processing resources of computer system 2600 can be organized into one or more processing units 2632, 2634, etc. A processin unit may ' include one or more processors, one or more cores from the same or different processors, a combination of cores and processors, or other combinations of cores and processors. In some embodiments, processing subsystem 2604 can include one or more special purpose eo-processors such as graphics processors, digital signal processors (DSPs), or the like hi some embodiments, some or all of the processing units of processing subsystem 2604 can be implemented using customized circuits such as application specific integrated circuits (ASICs), or field programmable gate arrays (FPGAs).

[0258] In some embodiments, the processing units in processing subsystem 2604 can execute instructions stored in system memory 2610 or on computer readable storage media 2622. In various embodiments, die processing units can execute a variety of programs or codeinstructions and can maintain multiple concurrently executing programs or processes. At any .given time. Some or all of the program cotie to be executed can be resident in system memory' 2610 and/or on computer-readable storage media 2622 including potentially on one or more storage devices. Through suitable programming, processing subsystem 2604 can provide various functionalities described above. In instances where computer system 2600 is executing care or more virtual machines, one or more processing units may be allocated to each virtual machine.

[0259] hi certain embodiments, a processing acceleration unit 2606 may optionally be provided for performing customized processing or for oft-loading some of the processing performed by processing subsystem 2604 so as to accelerate the overall processing performed by computer system 2600.

[0260] I/O subsystem 2608 may include devices and mechanisms for inputting information to computer system 2600 and/or for outputting information from or via computer system 2600. In general, use of die term input device is intended to include ail possible types of devices and mechanisms for inputting information to computer system 2600. User interface input devices may include, for example, a keyboard, pointing devices such as a mouse or trackball, a touchpad or touch screen incorporated into a display, a scroll wheel, a chck wheel, a dial, a button, a switch, a keypad, audio input devices with voice command recognition systems, microphones, and other types of input devices. User interface input devices may also include motion sensing and/or gesture recognition devices such as die Microsoft Kinect® motion sensor that enables users to control and interact with an input device, the Microsoft Xbox® 360 game controller, devices that provide an interface for receiving input using gestures and spoken commands. User interface input devices may also include eye gesture recognition devices such as die Google Glass® blink detector that detects eye activity' (e.g., ‘blinking” while taking pictures and/or making a menu selection) from users and transforms die eye gestures as inputs to an input device (e.g., Google Glass®). Additionally, user interface input devices may include voice recognition sensing devices that enable users to interact wills voice recognition systems (e.g., Sin® navigator) through voice commands.

| 26l] Other examples of user interface input devices include, without limitation, three dimensional (3D) mice, joysticks or pointing sticks, gamepads and graphic tablets, an audin/visual device such as speakers, digital cameras, digital camcorders, portable media players, webcams, image scanners, fingerprint scanners, barcode reader 3D scanners, 3D printers, laser rangefinders, and eye gaze tracking devices. Additionally, user interface input devices may include, for example, medical imaging input devices such as computed tomography, magnetic resonance imaging, position emission tomography, and medical ultrasonography devices. User interface input devices may also include, for example, audio input devices such as MIDI keyboards, digital musical instruments and the like.

[0262] In general, use of the term output device is intended to include all possible types of devices and mechanisms for outputing information from computer system 2600 to a user or other computer. User interface output devices may include a display’ subsystem, indicator lights, or non-visual displays such as audio output devices, etc. The display- subsystem may be a cathode ray tube (CRT), a flat-panel device, such as that using a liquid crystal display-' (LCD) or plasma display', a projection device, a touch screen, and the like. For example, user interface output devices may include, without limitation, a variety of display' devices that visually-' convey text, graphics and audio/video information such as monitors, printers, speakers, headphones, automotive navigation systems, plotters, voice output devices, and modems.

|0263] Storage subsystem 2618 provides a repository' or data store for storing information and data that is used by' computer system 2600. Storage subsystem 2618 provides a tangible non-transitory computer-readable storage medium tor storing the basic programming and data constructs that provide the functionality of some embodiments. Storage subsystem 2618 may' store software (e.g., programs, code modules, instructions) that when executed by' processing subsystem 2604 provides the functionality described above. The software may' be executed by one or more processing units of processing subsystem 2604. Storage subsystem 2618 may also provide a repository' for storing data used in accordance with the teachings of this disclosure.

[0264] Storage subsystem 2618 may include one or more non-transitory memory' devices, including volatile and non-volatile memory' devices. As shown in FIG. 26, storage subsystem 2618 includes a system memory 2610 and a computer-readable storage media 2622. System memory 2610 may include a number of memories including a volatile main random access memory (RAM) tor storage of instructions nd data during program execution and a nonvolatile read only memory (ROM) or Sash memory in which fixed instructions are stored. in some implementations, a bask input/output system (BIOS), containing the basic routines that help to transfer information between elements within computer system 2600, such as dmlng start-up, may typically be stored in tlie ROM. The RAM typically contains data and/or program modules that are presently being operated and executed by processing subsystem 2604. In some implementations, system memory 261 may include multiple different types of memory, such as static random access memory (SRAM), dynamic random access memory (DRAM), and the like.

[0265] By way of example, and not limitation, as depicted in FIG. 26, system memory 2610 may load application programs 2612 that are being executed, which may include various applications such as Web browsers, mid-tier applications, relational database management systems (RDBMS), etc., program data 2614, and an operating system 2616. By way of example, operating system 2616 may include various versions of Microsoft Windows®, Apple Macintosh®, and/or Linux operating systems, a variety of conmiereially- availahie UNIX® or UNIX-like operating systems (including without limitation the variety of GNU/Linux operating systems, the Google Chrome® 1 OS, and tire like) and or mobile operating systems such as iOS, Windows® Phone, Android® OS, BlackBerry® OS, Palm®, 1 OS operating systems, and others.

[0266] Computer-readable storage media 2622 may store programming and data constructs that provide tire functionality of some embodiments. Computer-readable media 2622 may provide storage of computer-readable instructions, data structures, program modules, and other data for computer system 2600. Software (programs, code modules, instructions) that, when executed by processing subsystem 2604 provides the functionality described above, may be stored in storage subsystem 2618. By way of example, computer-readable storage media 2622 may include non-volatile memory such as a hard disk drive, a magnetic disk drive, an optical disk drive such as a CD ROM. DVD, a Bhi-Ray® disk, or other optical media. Computer-readable storage media 2622 may include, but is not limited to, Zip® drives. Hash memory cards, universal serial bus (LISB) flash drives, secure digital (SD) cards, DVD disks, digital video tape, and the like. Computer-readable storage media 2622 may also include, solid-state drives (SSD) based on non-volatile memory such as flash-memory base SSDs, enterprise flash drives, soli state ROM, and fee like, SSDs based on volatile memory such as solid state RAM, dynamic RAM, static RAM, DRAM-based SSDs, magnetoresistive RAM ( RAM) S SDs, and hybrid SSDs that use a combination of DRAM and flash memory based SSDs.

[0267] la certain embodiments, storage subsystem 2618 may also include a computer- readable storage media reader 2620 that can further fee connected to computer-readable storage media 2622. Reader 2620 may receive and be configured to read data from a memory device such as a disk, a flash drive, etc.

[0268] In certain embodiments, computer system 2600 may support virtualization technologies, including but not limited to virtualization of processing and memory resources. For example, computer system 2600 may provide support for executing one or more virtual machines In certain embodiments, computer system 2600 may execute a program such as a hypervisor that facilitated the configuring and managing of the virtual machines. Each virtual machine may be allocated memory, compute (e.g., processors, cores), I/O. and networking resources. Each virtual machine generally runs independently of fee other virtual machines. A virtual machine typically runs its own operating system, which may be the same as or different from the operating systems executed by other virtual machines executed by computer system 2600. Accordingly, multiple operating systems may potentially be run concurrently by computer system 2600.

[0269] Communications subsystem 2624 provides an interface to oilier computer systems and networks. Communications subsystem 2624 serves as an interface for receiving data from and transmitting data to other systems from computer system 2600. For example, communications subsystem 2624 may enable computer system 2600 to establish a communication channel to one or more client devices via the Internet for receiving and sending information from and to the client devices.

[0270] Communication subsystem 2624 may support both wired and/or wireless communication protocols. For example, in certain embodiments, communications subsystem 2624 may include radio frequency (RF) transceiver components for accessing wireless voice and or data networks (e.g., using cellular telephone technology, advanced data network technology, such as 3G, 4G or EDGE (enhanced data rates for global evolution), WiFi (IEEE 802.XX family standards, or other mobile communication technologies, or any combination thereof), global positioning system (GPS) receiver components, and/or other components. In some embodiments communications subsystem 2624 caa provide wired networ connectivity (e.g,, Ethernet) in addition to or instead of a wireless interface.

[0271] Communication subsystem 2624 can receive an transmit data in various forms. For example, in some embodiments, in addition to oilier forms communications subsystem 2624 may receive input ec sminicafions in the form of structured and/or unstructured data feeds 2626, event streams 2628, event updates 2630, arid the like, For e ample, communications subsystem 2624 may be configured to receive (or send) data feeds 2626 in" real-time from users of social media networks and/or oilier communication sendees such as Twitter® feeds, Faceboofc® updates, web feeds such as Rich Site Summary (RSS) feeds, and/or real-time updates from one or more third par ty information sources

[0272] In certain embodiments, communications subsystem 2624 may lie configured to receive data in the form of continuous data streams, which may- include event streams 2628 of real-time events and/or event updates 2630, that may be continuous or unbounded in nature with no explicit end. Examples of applications that generate continuous data may’ include, for example sensor data applications, financial tickers, network performance measuring tools (e.g. network monitoring and traffic management applications), clickstream analysis fools, automobile traffic monitoring, and the like

[0273] Communications subsystem 2624 may also be configured to communicate data from computer system 2600 to other computer systems or networks. Tire data may be communicated in various different forms such as structured and or unstructured data feeds 2626, event streams 2628, event updates 2630, and the like to one or more databases that may be in communication with one or more streaming data source computers coupled to computer system 2600.

[0274] Computer system 2600 can be one of various types, including a handheld portable device (e.g., an iPhone® cellular phone, an iPad® computing tablet, a PDA), a wearable device (e.g., a Google Glass® head mounted display), a personal computer, a workstation, a mainframe, a kiosk, a server rack, or any other data processing system. Due to die ever- changing nature of computers and networks, the description of computer system 2600 depicted in FIG. 26 is intended only as a specific example. Many other configurations having more or fewer components than the system depicted in FIG. 26 are possible. Based on the disclosur e and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments. |0275} Although specific embodiments have been described, various modifications, alterations, alternative constructions, and equivalents are possible. Embodiments are not restricted to operation within certain specific data processing environments, but ar e lice to operate within a plurality of data processing environments. Additionally, although certain embodiments have been described using a particular series of transactions and steps, it should be apparent to those skilled in the art that tins is not intended to he limiting. Althoug some flowcharts describe operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. A process may have additional steps not included in the figure. Various features and aspects of the above-described embodiments may be used individually or jointly.

[0276} Further while certain embodiments have been described using a particular combination of hardware and software, it should be recognized that other combinations of hardware and software are also possible. Certain embodiments may be implemented only in hardware . , or only in software, or using combinations thereof. The various processes describe herein can be implemented on the same processor or different processors in any combination.

[0277] Where devices, systems, components or modules are described as being configured to perform certain operations or functions, such configuration can be accomplished, for example, by designing electronic circuits to perform the operation, by programming programmable electronic circuits (such as microprocessors) to perform the operation such as by executing computer instructions or code, or processors or cores programmed to execute code or instructions stored on a noil-transitory memory medium, or any combination thereof. Processes can communicate using a variety of techniques including but not limited to conventional techniques for inter-process communications, and different pairs of processes may use different techniques, or the same pair of processes may use differ ent techniques at different times.

[0278] Specific details are given in this disclosure to provide a thorough understanding of the embodiments. However, embodiments may be practiced without these specific details. For example, well-known circuits, processes, algorithms, structures, and techniques have been shown without unnecessary detail in order to avoid obscuring the embodiments. This description provides example embodiments only, and is not intended to limit the scope, applicability, or configuration of other embodiments. Rather, the preceding description of the embodiments will provide those skilled in the art with an enabling description for implementing various embodiments. Various changes may be made in the faction and arrangement of elements.

[027?] Hie specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense it will, however, be evident that additions, subtractions, deletions, and oilier modifications and changes may be made thereunto without departing from the broader spirit and scope as set forth in fee claims. Thus, although specific embodiments have been described, these are not intended to be limiting. Various modifications and equivalents, and any relevant combination of die disclosed features, are within the scope of the following claims.