Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
FOR HIEARCHICAL DECOMPOSITION DEEP REINFORCEMENT LEARNING FOR AN ARTIFICIAL INTELLIGENCE MODEL
Document Type and Number:
WIPO Patent Application WO/2018/236674
Kind Code:
A1
Abstract:
Methods and apparatuses that apply a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed in a hierarchical graph incorporated into an Al model. The individual sub-tasks of a decomposed task may correspond to its own concept node in the hierarchical graph and are initially trained on how to complete their individual sub-task and then trained on how the all of the individual sub-tasks need to interact with each other in the complex task in order to deliver an end solution to the complex task. Next, during the training, using reward functions focused for solving each individual sub-task and then a separate one or more reward functions focused for solving the end solution of the complex task. In addition, where reasonably possible, conducting the training of the Al objects corresponding to the individual sub-tasks in the complex task, in parallel at the same time.

Inventors:
CAMPOS MARCOS (US)
GUDIMELLA ADITYA (US)
STORY ROSS (US)
SHAKER MATINEH (US)
KONG RUOFAN (US)
SHNAYDER VICTOR (US)
BROWN MATTHEW (US)
Application Number:
PCT/US2018/037650
Publication Date:
December 27, 2018
Filing Date:
June 14, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BONSAI AL INC (US)
International Classes:
G06F3/0482; G06N3/08
Foreign References:
US20120209794A12012-08-16
US20120239598A12012-09-20
US20150066929A12015-03-05
US9460088B12016-10-04
US20170213131A12017-07-27
Other References:
HENGST B.: "Sate State Abstraction and Reusable Continuing Subtasks in Hierarchical Reinforcement Learning", AL 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, December 2007 (2007-12-01), pages 58 - 67, XP055565172, Retrieved from the Internet [retrieved on 20180816]
Attorney, Agent or Firm:
FERRILL, Thomas, S. (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1 . An apparatus, comprising:

an Artificial Intelligence ("Al") engine that has multiple independent modules on one or more computing platforms, where the multiple independent modules are configured to have their instructions executed by one or more processors in the one or more computing platforms, where the Al engine has a user interface presented on a display screen for use by one or more users;

where a first module is configured to apply a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed in a hierarchical graph incorporated into an Al model, where the first module uses the hierarchical-decomposition reinforcement learning technique to hierarchically decompose a complex task into multiple smaller, individual sub-tasks making up the complex task, where one or more of the individual sub-tasks, correspond to its own concept node in the hierarchical graph, and where the Al engine is configured to initially train the Al objects on the individual sub-tasks and then train on how the individual sub- tasks need to interact with each other in the complex task in order to deliver an end solution to the complex task;

where the user interface is configured to cooperate with the first module to send information for the first module, which the first module uses the information to apply the hierarchical-decomposition reinforcement learning technique to train the one or more Al objects;

where the first module is also configured to cooperate with one or more data sources to obtain data for training and to conduct the training of the one or more Al objects corresponding to concept nodes in parallel at the same time; and

wherein, via decomposing the complex task, the first module is able to use reward functions focused for solving each individual sub-task and then one or more reward functions focused for the end solution of the complex task, as well as the first module is able to conduct the training of the Al objects corresponding to the individual sub-tasks in the complex task, in parallel, which the parallel training and the use of reward functions focused for solving each individual sub-task speed up an overall training duration for the complex task on the one or more computing platforms, and resulting Al model, compared to an end-to-end training with a single algorithm for all of the Al objects incorporated into the Al model.

2. The apparatus of claim 1 , further comprising:

an architect module is configured to instantiate the Al objects corresponding to the concepts of the complex task into the graph of i) a first concept node corresponding to an integrator node and ii) one or more levels of concepts corresponding to the individual sub-tasks that hierarchically stem forth from the integrator node in the graph of the Al model, where the integrator node is trained to choose which of its children nodes is most appropriate for solving its sub task to achieve the end solution of the complex task.

3. The apparatus of claim 1 , further comprising: where the first module is an instructor module configured to cause the Al engine to i) initially train each integrator and its set of child nodes feeding that integrator node in the graph of nodes, where each child node in the set feeding that integrator node is either 1 ) an individual Al object or 2) another integrator node that is trained to best satisfy its own reward function for its described individual sub task; and next, ii) where the training of integrator nodes and their child nodes using reward functions to best satisfy its described individual sub task continues up the graph of nodes until that process reaches a root integrator node, where a reward function for the root integrator node is focused for best satisfying the end solution to the complex task.

4. The apparatus of claim 2, further comprising: where the Al engine decomposing the complex task allows each concept making up the complex task in the graph to use a most appropriate training approach for that individual sub-task, whether that be a classical motion controller, a pre-existing learned model, or a neural network that needs to be trained rather than the whole Al model being trained with one of these training approaches.

5. The apparatus of claim 1 , further comprising:

an architect module configured to automatically partition the individual sub-tasks into the concept nodes in the Al model to be trained on in a number of ways, where the ways of conveying the partitioning of the individual sub-tasks into the concept nodes are selected from a group consisting of i) how to partition the individual sub-tasks is explicitly defined in scripted code from the user, ii) how to partition the individual sub- tasks is hinted at by giving general guidance in the scripted code from the user, iii) how to partition the individual sub-tasks is interpreted from guidance based on responses from the user to a presented list of questions, iv) how to partition the individual sub- tasks using a clustering technique, and v) any combination of these four, and then the architect module proposes a hierarchical structure for the graph of Al objects making up the Al model.

6. The apparatus of claim 1 , further comprising:

an integrator that is configured to select a first set of concept nodes from two or more sets in the graph of the complex task to be trained on and computed.

7. The apparatus of claim 5, further comprising:

i) where the hinted at guidance in the scripted code from the user on how to automatically partition and ii) where the interpreted guidance from answered questions on how to automatically partition is based on state space, where the architect module based on the guidance identifies regions of state space that correspond to separable, individually solvable subtasks and creates distinct policies for each so identified region of state space.

8. The apparatus of claim 6, further comprising: where the integrator is configured to check to see what is a lowest level of dependency in the first set of concept nodes in the graph that needs to be calculated for its computations including its output, and where the first data source is a simulator, and where the first module is configured to supply the data from the first data source to the first set of concept nodes that need to make their individual computations as well as to the integrator, which the integrator checks to see that the computations for all of the nodes merely in the first set of nodes occur, which saves an amount of computing power and cycles compared to computing the all of the nodes making up the Al model each training cycle.

9. The apparatus of claim 1 , further comprising:

where the Al objects of the Al model include a blend of at least a first set and second set of Al objects being trained by the first module via reinforcement learning and a third set of Al objects that are configured to operate in two ways: 1 ) as control nodes where one or more actions are produced by the code this node and/or 2) this node just implements a data transformation step, where the first module of the Al engine is configured to manage multiple simulations from the data sources in parallel at the same time to train the first and second sets of Al objects with the reinforcement learning.

10. The apparatus of claim 1 , where the first module is a learner module further configured to include a conductor service configured to handle scaling of running multiple training sessions for the two or more Al objects in parallel at the same time by dynamically calling in additional computing devices to load and run additional instances of processes for the first module for each training session occurring in parallel.

1 1 . A method to apply Reinforcement Learning for an Artificial Intelligence model for subsequent deployment of that Al model, comprising:

applying a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed in a hierarchical graph

incorporated into the Al model, using the hierarchical-decomposition reinforcement learning technique to hierarchically decompose a complex task into multiple smaller, individual sub-tasks making up the complex task, where one or more of the individual sub-tasks, correspond to its own concept node in the hierarchical graph, and initially train the Al objects on the individual sub-tasks and then train on how the individual sub-tasks need to interact with each other in the complex task in order to deliver an end solution to the complex task; using information supplied from a user interface to apply the hierarchical- decomposition reinforcement learning technique to train one or more Al objects;

cooperating with one or more data sources to obtain data for training and to conduct the training of the two or more Al objects corresponding to concept nodes in the hierarchical graph, in parallel at the same time; and

wherein decomposing the complex task, enables use of reward functions focused for solving each individual sub-task and then one or more reward functions focused for the end solution of the complex task, as well as enabling conducting the training of the Al objects corresponding to the individual sub-tasks in the complex task, in parallel, where a combined parallel training and the use of reward functions focused for solving each individual sub-task speed up an overall training duration for the complex task on one or more computing platforms, and subsequent deployment of a resulting Al model that is trained, compared to an end-to-end training with a single algorithm for all of the Al objects incorporated into the Al model.

12. The method of claim 1 1 , further comprising:

instantiating the Al objects corresponding to the concepts of the complex task into the graph of i) a first concept node corresponding to an integrator node and ii) one or more levels of concepts corresponding to the individual sub-tasks that hierarchically stem forth from the integrator node in the graph of the Al model, where the integrator node is trained to choose which of its children nodes is most appropriate for solving its sub task to achieve the end solution of the complex task.

13. The method of claim 1 1 , further comprising: causing the Al engine to i) initially train each integrator and its set of child nodes feeding that integrator node in the graph of nodes, where each child node in the set feeding that integrator node is either 1 ) an individual Al object or 2) another integrator node that is trained to best satisfy its own reward function for its described individual sub task; and next, ii) where the training of integrator nodes and their child nodes using reward functions to best satisfy its described individual sub task continues up the graph of nodes until that process reaches a root integrator node, where a reward function for the root integrator node is focused for best satisfying the end solution to the complex task.

14. The method of claim 12, further comprising: where the Al engine decomposing the complex task allows replacing one or more concepts making up the complex task without retraining each concept in the graph making up that Al model.

15. The method of claim 1 1 , further comprising:

automatically partitioning the individual sub-tasks into the concept nodes in the Al model to be trained on in a number of ways, where the ways of conveying the

partitioning of the individual sub-tasks into the concept nodes are selected from a group consisting of i) how to partition the individual sub-tasks is explicitly defined in scripted code from the user, ii) how to partition the individual sub-tasks is hinted at by giving general guidance in the scripted code from the user, iii) how to partition the individual sub-tasks is interpreted from guidance based on responses from the user to a

presented list of questions, and iv) any combination of these three, and then the architect module proposes a hierarchical structure for the graph of Al objects making up the Al model.

16. The method of claim 1 1 , further comprising: where an integrator is configured to select a first set of concept nodes from two or more sets in the graph of the complex task to be trained on and computed.

17. The method of claim 15, further comprising:

i) where the hinted at guidance in the scripted code from the user on how to automatically partition and ii) where the interpreted guidance from answered questions on how to automatically partition is based on state space, where an architect module based on the guidance identifies regions of state space that correspond to separable, individually solvable subtasks and creates distinct policies for each so identified region of state space.

18. The method of claim 16, further comprising:

where the first data source is a simulator, where each integrator node in the graph of nodes is configured to evaluate in turn, based on data from the simulator, which of the nodes in the graph are currently appropriate to solving its sub task, and merely those nodes so chosen will be further provided with data from the simulator and evaluated, which saves an amount of computing power and cycles compared to computing all of the nodes making up the Al model each evaluation cycle.

19. The method of claim 1 1 , further comprising:

where the Al objects of the Al model include a blend of at least a first set and second set of Al objects being trained by the first module via reinforcement learning and a third set of Al objects that are configured to operate in two ways: 1 ) as control nodes where one or more actions are produced by the code this node and/or 2) this node just implements a data transformation step, where the first module of the Al engine is configured to manage multiple simulations from the data sources in parallel at the same time to train the first and second sets of Al objects with the reinforcement learning.

20. An artificial intelligence engine, comprising:

two or more modules configured to cooperate with each other in order to apply a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed into a hierarchical graph incorporated into an Al model, where individual sub-tasks of a decomposed task correspond to its own concept node in the hierarchical graph, where one or more of the individual sub-tasks of the decomposed task incorporated into its own hierarchical graph of concept nodes are initially trained on how to complete their individual sub-task and then an integrator node is trained to choose which of its children nodes is most appropriate for solving its individual sub-task at that moment, to deliver an end solution to the complex task, where during training of the individual sub-tasks of the decomposed task the Al engine is configured to use i) reward functions focused for solving each individual sub-task and then ii) a separate one or more reward functions focused for solving the end solution of the complex task with all of the individual sub-tasks, where the Al engine is configured to conduct the training of a first individual sub-task of the decomposed task incorporated into its own hierarchical graph of concept nodes in parallel at the same time with the training of a second individual sub-task of the decomposed task incorporated into its own hierarchical graph of concept nodes.

Description:
CROSS-REFERENCE

[0001] This application is continuation-in-part of U.S. Patent Application No. 15/417,086 titled "An artificial intelligence engine having multiple independent processes on a cloud-based platform configured to scale," filed January 26, 2017, which claims the benefit of U.S. Provisional Application No. 62/287,861 , filed January 27, 2016, titled "Bonsai platform, language, and tooling," each of which is incorporated herein by reference in its entirety. This application also claims the benefit under 35 USC 1 19 of U.S. Provisional Application No. 62/524,381 , titled "Systems and methods for extending functionality of trained machine-learning models, filed June 23, 2017, and U.S. Provisional Application No. 62/547,339, titled "An artificial intelligence engine having multiple improvements," filed August 18, 2017, which is also incorporated herein by reference in its entirety.

NOTICE OF COPYRIGHT

[0002] A portion of the disclosure of this patent application contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the material subject to copyright protection as it appears in the United States Patent & Trademark Office's records for this application, but otherwise reserves all copyright rights whatsoever.

FIELD

[0003] Embodiments of the design provided herein generally relate to an Artificial Intelligence ("Al") engine using a hierarchical-decomposition deep reinforcement technique to train and assemble an Al model.

BACKGROUND

[0004] Deep reinforcement learning yields great results for a large array of problems, but Al models are generally retrained anew for each new problem to be solved. Prior learning and knowledge are difficult to incorporate when training new Al models, requiring increasingly longer training as problems become more complex. This is especially problematic for problems with sparse rewards.

[0005] Learning goal-directed skills is a major challenge in reinforcement learning when the environment's feedback is sparse. The difficulty arises from insufficient exploration of the state space by an agent, and results in the agent not learning a robust policy or value function. The problem is further exacerbated in high-dimensional tasks, such as in robotics. Although the integration of non-linear function approximators, such as deep neural networks, with reinforcement learning has made it possible to learn patterns and abstractions over high dimensional spaces {see Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L, Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al., 2016. Mastering the game of go with deep neural networks and tree search. Nature 529, 484-489.; as well as Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A.,Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al., 2015. Human-level control through deep reinforcement learning. Nature 518, 529-533), the problem of exploration in the sparse reward regime is still a significant challenge. Rarely occurring sparse reward signals are difficult for neural networks to model, since the action sequences leading to high reward must be discovered in a much larger pool of low-reward sequences. In addition to the above difficulties, robotics tasks that involve dexterous manipulation of objects have the additional challenge of a trade-off between robustness and flexibility.

[0006] In such settings, one natural solution is for the agent to learn, plan, and represent knowledge at different levels of temporal abstractions, so that solving intermediate tasks at the right times helps in achieving the final goal. Sutton et al. [1999] provided a mathematical framework for extending the notion of "actions" in reinforcement learning to "options", which are policies taking a certain action over a period of time, (Sutton, R.S., Precup, D., Singh, S., 1999. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 1 12, 181 -21 1 ). [0007] Another problem is the total amount of time it takes to train an Al model on a task just to experiment to see if learning that task up to an acceptable level is even possible.

[0008] Some approaches have a person scripting code to train an Al model verses a module. Likewise, some approaches have a person scripting code to instantiate Al objects for an Al model.

SUMMARY

[0009] In general, methods and apparatuses apply a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed in a hierarchical graph incorporated into an Al model.

[00010] In an embodiment, an Al engine has multiple independent modules to work on one or more computing platforms. The multiple independent modules are configured to have their instructions executed by one or more processors in the one or more computing platforms and any software instructions they may use can be stored in one or more memories of the computing platforms. The Al engine has a user interface presented on a display screen for use by one or more users. The user interface is configured to cooperate with a first module to send information for the first module, which the first module then uses the information to apply the hierarchical-decomposition reinforcement learning technique to train one or more Al object.

[00011] The first module applies a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed in a hierarchical graph incorporated into an Al model. The first module, such as an instructor module, uses the hierarchical-decomposition reinforcement learning technique to hierarchically decompose a complex task into multiple smaller, individual sub-tasks making up the complex task. One or more of the individual sub-tasks corresponds to its own concept node in the hierarchical graph. The Al engine may initially train the Al objects on the individual sub-tasks and then train the Al objects on how the individual sub-tasks need to interact with each other in the complex task in order to deliver an end solution to the complex task.

[00012] A first module, such as a learner module, is configured to cooperate with one or more data sources to obtain data for training and to conduct the training of one or more Al objects corresponding to concept nodes in parallel at the same time.

[00013] The Al engine decomposing the complex task allows for both i) the first module to use reward functions focused for solving each individual sub-task, and then one or more reward functions focused for the end solution of the complex task, as well as ii) the first module to conduct the training of the Al objects corresponding to the individual sub-tasks in the complex task, in parallel at the same time. The combined parallel training and the use of reward functions focused for solving each individual sub-task speed up an overall training duration for the complex task on the one or more computing platforms, and subsequent deployment of a resulting Al model that is trained, compared to an end- to-end training with a single algorithm for all of the Al objects incorporated into the Al model.

[00014] These and other features of the design provided herein can be better understood with reference to the drawings, description, and claims, all of which form the disclosure of this patent application.

DRAWINGS

[00016] The drawings refer to an embodiment of the design provided herein in which:

[00017] Figure 1 A provides a block diagram illustrating an Al system and its cloud- based computing platforms infrastructure in accordance with an embodiment.

[00018] Figure 1 B provides a block diagram illustrating an Al system and its on- premises based computing platforms infrastructure in accordance with an embodiment.

[00019] Figures 2A and 2B provide block diagrams illustrating an embodiment of Al system with an Al engine that has multiple independent modules that use a hierarchical-decomposition deep reinforcement technique for training an Al model.

[00020] Figure 3A illustrates a block diagram of an embodiment of a concept in a mental model that receives input data from a data source, computes its function, and generates output data.

[00021] Figure 3B also illustrates a block diagram of an embodiment of a concept in a mental model that receives input data from a data source, computes its function, and generates output data.

[00022] Figure 4A illustrates a block diagram of an embodiment of a complex task with multiple smaller, individual sub-tasks making up the complex task, and the individual sub-tasks correspond to its own Al object in the concept network.

[00023] Figure 4B illustrates a block diagram of an embodiment of a complex task with multiple hierarchical levels of concept nodes.

[00024] Figure 4C illustrates a block diagram of an embodiment of a complex main task and its graph of i) a concept node corresponding to an integrator task, and ii) one or more levels of concepts corresponding to the individual sub-tasks that hierarchically stem forth from the integrator task in the graph of the Al model. [00025] Figure 4D illustrates a block diagram of an embodiment of a graph of the training of two or more different sub-concepts corresponding to the individual sub-tasks in the complex task, in parallel, which the parallel training and simpler reward functions speed up the overall training duration for the complex task on the one or more computing platforms.

[00026] Figure 4E illustrates a diagram of an embodiment of an example Al model being utilized by a robotic arm to carry out individual sub-tasks in the complex task.

[00027] Figure 4F illustrates a block diagram of an embodiment of the Al engine that solves the example "Grasp and Stack" complex task with concept network reinforcement learning.

[00028] Figure 5 illustrates a block diagram of an embodiment of a user interface for a simulator training one or more concept nodes using reinforcement learning to learn to choose a sub-task recommended from the two or more Al object in the levels stemming from the integrator.

[00029] Figure 6 illustrates a block diagram of an embodiment of the Al engine using simpler reward functions focused for solving each individual sub-task.

[00030] Figure 7 illustrates a graph of an embodiment of the training of the individual sub-task of the Orient from Figure 4F and its reward function focused for that sub-task.

[00031] Figure 8 illustrates a graph of an embodiment of the training of the individual sub-task of Lift from Figure 4F and its reward function focused for that sub- task.

[00032] Figure 9 illustrates a graph of an embodiment of the training of the interactions of the individual sub-tasks to achieve the complex task of Grasp-n-Stack and its reward function focused for that sub-task.

[00033] Figure 9 provides a block diagram illustrating one or more computing systems in accordance with an embodiment. [00034] Figure 10 illustrates a number of electronic systems and devices communicating with each other in a network environment in accordance with an embodiment.

[00035] Figure 1 1 illustrates a computing system 900 that can be, wholly or partially, part of one or more of the server or client computing devices in accordance with an embodiment.

[00036] Figures 12A through 12C provide flow diagrams illustrating a method for a hierarchical-decomposition deep reinforcement learning for an Al model in accordance with an embodiment.

[00037] While the design is subject to various modifications, equivalents, and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will now be described in detail. It should be understood that the design is not limited to the particular embodiments disclosed, but- on the contrary- the intention is to cover all modifications, equivalents, and alternative forms using the specific embodiments.

DESCRIPTION

[00038] In the following description, numerous specific details are set forth, such as examples of specific data signals, named components, memory in a device, etc., in order to provide a thorough understanding of the present design. It will be apparent, however, to one of ordinary skill in the art that the present design can be practiced without these specific details. In other instances, well known components or methods have not been described in detail, but rather in a block diagram in order to avoid unnecessarily obscuring the present design. Further, specific numeric references such as a first database, can be made. However, the specific numeric reference should not be interpreted as a literal sequential order, but rather interpreted that the first database is different than a second database. Thus, the specific details set forth are merely exemplary. Also, the features implemented in one embodiment may be implemented in another embodiment where logically possible. The specific details can be varied from and still be contemplated to be within the spirit and scope of the present design. The term "coupled" is defined as meaning connected either directly to the component or indirectly to the component through another component.

[00039] Training Al models can take days, weeks, or even longer. Provided herein are time-saving training aids and methods thereof for the training of the Al models. In general, an Al engine is configured to apply a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed in a hierarchical graph incorporated into an Al model. The individual sub-tasks of a decomposed task may correspond to its own concept node in the hierarchical graph and are initially trained on how to complete their individual sub-task, and then trained on how all of the individual sub-tasks need to interact with each other in the complex task in order to deliver an end solution to the complex task. Next, during the training, using reward functions focused for solving each individual sub-task and then a separate one or more reward functions focused for solving the end solution of the complex task. In addition, where reasonably possible, conducting the training of the Al objects corresponding to the individual sub-tasks in the complex task, in parallel at the same time. The combined parallel training and simpler reward functions speed up the overall training duration for the complex task on the one or more computing platforms, and subsequent deployment of a resulting Al model that is trained, compared to an end-to-end training with a single algorithm for all of the Al objects incorporated into the Al model. The Al engine makes more efficient use of the existing computing platforms by scaling the computing platforms to train the different concepts in parallel.

[00040] In addition, an integrator component can be configured to check to see what is a lowest level of dependency in the first set of concept nodes in the graph that needs to be calculated for its computations including its output. A module is configured to supply the data from the data source to merely the first set of concept nodes that need to make their individual computations instead of all the concept nodes in the graph. The integrator checks to see that the computations for all of the nodes merely in the first set of nodes occur, which saves an amount of computing power and cycles compared to computing all of the nodes making up the Al model in each training cycle. [00041 ] FIGs. 1 A-2B and Figures. 10-1 1 illustrate example computing infrastructure, e.g., Al Engines, that may be implemented with the hierarchical-decomposition deep reinforcement learning for an Al model. Figures 3A through 9 and Figures 12A-12C illustrate example details about the hierarchical-decomposition deep reinforcement learning for an Al model. All of the Figures discuss example details of the design discussed herein.

Brief discussion of components in the Al engine

[00042] FIGs. 2A and 2B provide block diagrams illustrating an embodiment of Al system with an Al engine that has multiple independent modules that use a hierarchical- decomposition deep reinforcement technique for training an Al model.

[00043] The Al engine for generating a trained Al model 106 can include one or more Al-generator modules selected from at least an instructor module 324, an architect module 326, and a learner module 328 as shown. The instructor module 324 can optionally include a hyperlearner module 325, and which can be configured to select one or more hyper parameters for any one or more of a neural network configuration, a learning algorithm, and the like. The hyperlearner module 325 can optionally be contained in a different Al-generator module such as the architect module 326 or the learner module 328, or the hyperlearner module 325 can be an Al-generator module itself. The learner module 328 can optionally include a predictor module 329, which can provide one or more predictions for a trained Al model. The predictor module 329 can optionally be contained in a different Al-generator module such as the instructor module 324 or the architect module 326, or the predictor module 329 can be an Al-generator module itself. The Al engine including the foregoing one or more Al-generator modules can be configured to generate the trained Al model, such as trained Al model 106, from compiled scripted software code written in a pedagogical software programming language via one or more training cycles with the Al engine.

[00044] One or more clients 210 can make a submission to create a trained Al model. Once a Mental Model {see Figures 3A and 3B) and curricula have been coded in the pedagogical software programming language, then the code through the user interface 212 can be compiled and sent to the three main modules, the learner module 328, the instructor module 324, and the architect module 326 of the Al engine for training. One or more user interfaces 212, such a web interface, a graphical user interface, and/or command line interface, will handle assembling the scripted code written in the pedagogical software programming language, as well as other ancillary steps like registering the line segments with the Al engine, together with a single command. However, each module-the Al compiler module 222, the web enabled interface 221 to the Al engine, the learner module 328, etc., can be used in a standalone manner, so if the author prefers to manually invoke the Al compiler module, manually perform the API call to upload the compiled pedagogical software programming language to the modules of the Al engine, etc., they have the flexibility and freedom to do so.

[00045] Thus, one or more clients 210 can send scripted code from the coder 212 or another user interface to the Al compiler 222. The Al compiler 222 compiles the scripted software code written in a pedagogical software programming language. The Al compiler 222 can send the compiled scripted code, similar to an assembly code, to the instructor module 324, which, in turn, can send the code to the architect module 326. Alternatively, the Al compiler 222 can send the compiled scripted code in parallel to all of the modules needing to perform a sub-task on the compiled scripted code. The architect module 326 can propose a vast array of machine learning algorithms, such as various neural network layouts, as well as optimize the topology of a network of intelligent processing nodes making up an Al object. The architect module 326 can map between concepts and layers of the network of nodes and send one or more instantiated Al objects to the learner module 328. Once the architect module 326 creates the topological graph of concept nodes, hierarchy of sub-concepts feeding parameters into that complex task (if a hierarchy exists in this layout), and learning algorithm for each of the complex task and sub-concepts, then training by the learner module 328 and instructor module 324 may begin.

[00046] The instructor module 324 can request training data from the training data source 219. Training can be initiated with an explicit start command in the pedagogical software programming language from the user to begin training. In order for training to proceed, the user needs to have already submitted compiled pedagogical software programming language code and registered all of their external data sources such as simulators (if any are to be used) via the user interfaces with the learner and instructor modules 324, 326 of the Al engine.

[00047] The training data source 219 can send the training data to the instructor module 324 upon the request. The instructor module 324 can subsequently instruct the learner module 328 on training the Al object with pedagogical software programming language based curricula for training the concepts into the Al objects. Training an Al model can take place in one or more training cycles to yield a trained state of the Al model 106. The instructor module 324 can decide what pedagogical software programming language based concepts and streams should be actively trained in a mental model. The instructor module 324 can know what are the terminating conditions for training the concepts based on user criteria and/or known best practices. The learner module 328 or the predictor 329 can elicit a prediction from the trained Al model 106 and send the prediction to the instructor module 324. The instructor module 324, in turn, can send the prediction to the training data source 219 for updated training data based upon the prediction and, optionally, instruct the learner module 328 in additional training cycles. When one or more training cycles are complete, the learner module 328 can save the trained state of the network of processing nodes in the trained Al model 106. (Note a more detailed discussion of different embodiments of the components making up the Al engine occurs later.)

[00048] The Al engine has multiple independent modules 222, 324, 325, 326, 328, and 329 on one or more computing platforms. The multiple independent modules 222,

324, 325, 326, 328, and 329 have their instructions executed by one or more processors in the one or more computing platforms. The multiple independent modules 222, 324,

325, 326, 328, and 329 may be loaded into one or more memories of the one or more computing platforms.

[00049] The instructor module 324 may apply a hierarchical-decomposition deep reinforcement technique to train one or more Al objects corresponding to concept nodes in an Al model 106. The instructor module 324 may use the hierarchical-decomposition deep reinforcement technique to solve a wide variety of a set of complex tasks in a modular way, through hierarchically decomposing a complex task into multiple smaller, individual sub-tasks making up the complex task. One or more of the individual sub- tasks, correspond to its own concept node in the graph. The Al engine 200 may initially train the Al objects on the individual sub-tasks in parallel at the same time and then train on how the individual sub-tasks need to interact with each other in the complex task in order to deliver an end solution to the complex task.

[00050] The instructor module 324 decomposing the complex task allows the Al engine 200 to use simpler reward functions focused for solving each individual sub-task and then much simpler reward functions focused for the end solution of the complex task. The Al engine 200 decomposing the complex task also allows conducting the training of two or more different concepts corresponding to the individual sub-tasks in the complex task, in parallel. The parallel training and simpler reward functions speed up an overall training duration for the complex task and resulting Al model on the one or more computing platforms. The parallel training and simpler reward functions speed up an overall training duration for the complex task compared to an end-to-end training with a single algorithm for all of the Al objects incorporated into the Al model.

[00051] Reward functions can be more easily defined by decomposing the complex task. Instead of specifying a complex reward function for solving the whole task, the system designer can define rewards that are specific to each sub-task. These are usually simpler to define. Once the sub-tasks are ready, the designer can specify a simpler and potentially sparse reward function for selector nodes. This greatly simplifies solving complex problems with reinforcement learning.

[00052] Also, decomposing the complex task allows reusing all or just portions of one or more pre-trained models for solving a new larger complex task.

[00053] Also, decomposing the complex task allows each concept making up the complex task in the graph to use the most appropriate training approach for that individual sub-task, whether that be a classical motion controller, a pre-existing learned model, or a neural network that needs to be trained rather than the whole Al model being trained with one of these training approaches.

[00054] Also, decomposing the complex task allows replacing one or more concepts making up the complex task without retraining each concept making up that system. For example, in Figure 4B, the Al model may switch between using different versions of the concepts of a Grasp and/or Stack. The different versions of the concepts of a Grasp and/or Stack can be adapted without having to change or retrain the Reach, Move, or overall integrator concepts.

[00055] The Al system may implement a form of deep reinforcement learning with hierarchical decomposition of the complex task into concepts of individual sub-tasks in contrast to a standard notion of running a single end-to-end algorithm training.

[00056] The Al engine 200 has a user interface 212 presented on a display screen for use by one or more users in a user's organization. The user interface 212 is configured to set the modules in the Al engine to train two or more concept nodes in parallel at the same time. The learner module 324, including a conductor service, may cooperate with one or more data sources to obtain data for training and conduct the training of one or more Al objects corresponding to concept nodes in parallel at the same time.

[00057] An "Al model" as used herein includes, but is not limited to, neural networks such as recurrent neural networks, recursive neural networks, feed-forward neural networks, convolutional neural networks, deep belief networks, and convolutional deep belief networks; multi-layer perceptions; decision trees; self-organizing maps; deep Boltzmann machines; and Stacked de-noising auto-encoders. The modules of the Al engine are configured to utilize many different machine learning algorithms to generate and/or training a resulting Al model.

[00058] An "artificial neural network" or simply a "neural network" as used herein can include a highly interconnected network of processing elements, each optionally associated with a local memory. The processing elements can be referred to herein as "artificial neural units," "artificial neurons," "neural units," "neurons," "nodes," and the like, while connections between the processing elements can be referred to herein as "synapses," "weights," and the like. A neuron can receive data from an input or one or more other neurons respectively through one or more weighted synapses, process the data, and send processed data to an output or yet one or more other neurons respectively through one or more other weighted synapses. The neural network or one or more neurons thereof can be generated in either hardware, software, or a combination of hardware and software, and the neural network can be subsequently trained.

[00059] A module may consist of one or more processes including one or more services, one or more electronic circuits, or a combination of one or more software processes cooperating with the electronic circuits.

[00060] Note, each trained Al model itself can be a collection of trained Al objects corresponding to a complex task, that is attempted to be analyzed and solved by the Al model. Where, a set of concepts in a hierarchical structure feed parameters into the complex task. The Al database 341 can index Al objects corresponding to the complex task and the set of concepts making up a given trained Al model so that reuse, recomposition, and reconfiguration of all or part of a trained Al model is possible.

[00061] A software process may be an instance of an executable file configured to perform a task in a finite amount of time (i.e., a job). Thus, each process is configured to operate for a finite amount of time to achieve its configured goal and then shut down until invoked again when needed in the future. Several instances of a same process each wrapped in its own container may run simultaneously on one or more computing devices. A service may be a process, which runs in the background. Each independent process is configured to be aware of the existence of the other processes and knows whom to call and what data and types of inputs that other processes looking for. Also, functionality performed by one software process may be combined into another software process or migrated in part to another software process. For example, in an embodiment, the 'instructor' and 'learner' processes are merged into a single, combined process running within a single container named the 'scholar.' Thus, the 'instructor' and 'learner' may be implemented as independent processes. Each independent processes running in its own container. However, for performance reasons these 'instructor' and 'learner' processes may be merged into a single, combined process running within a single container named the 'scholar'. The functionality in the 'instructor' and 'learner' is still present as before, just not in independent processes.

[00062] Each of the independent process can be running its own computing device (e.g., see Figures 1 A & 1 B 709A-71 1 A), and then use a subnet to communicate communications between the other independent processes. As capacity exists, some independent processes may share a computing device. Also, using the subnets is much faster than, for example trying to conduct communications through the Internet via the Gateway, which would have a longer round-trip delay time or lag time.

[00063] Individual processes programmed to achieve and perform different functions within the Al engine are broken up into an individual process, each in its own software container. For example, 1 ) the architect process can be configured to create, instantiate, and figure out the topology of an Al model corresponding to a concept being trained for Al, 2) an instructor process can be configured to guide the training and how to do the training, and 3) a learner process to carrying out an actual execution of the training, as well as 4) a predictor process, during an Al models deployment, to make use of a trained Al model. Breaking these up into individual processes/modules that are aware of each other and know which process and/or service to call and how to call that process and also know which inputs and outputs to send to each other, allows the training to be broken up into these multiple discrete individual services.

[00064] Each process is configured as an independent process wrapped in its own container so that multiple instances of the same processes, (e.g.) learner and instructor, may be running simultaneously to scale to handle multiple users running training sessions, deploying Al modules, and creating Al models, all at the same time. Thus, the cloud or on-premises platform for the Al engine exists with servers, processes, and databases, that allows many users to connect over a wide area network, such as the Internet, from multiple computing devices and then the backend of the cloud platform is configured to handle the scaling, efficiency, etc., by dynamically calling in additional computing hardware machines to load on and run the independent processes of, for example, an instance of the learner and/or instance of the instructor, as needed.

[00065] The multiple independent processes carry out four or more separate tasks by interaction with and cooperation between the multiple independent processes. A first task can be creating a shell of an Al model, such as creating an Al model. A second task can be loading in a file of scripted code in a programming language to help define 1 ) a topology of processing nodes in the Al model, 2) a layout of the concepts making up the Al model, and 3) a selection of an appropriate learning algorithm for the Al model. The file created in pedagogical software programming language, such as Inkling™, helps the architect module to create the topology of processing nodes in the Al model, the layout of the concepts making up the Al model, etc., derived from the programming code. The third task is starting to train the Al model with a data source, such as a simulator. The fourth task is then deploying and using a trained Al model to do, for example, predictions on data from the data source.

[00066] Each independent process, such as 1 ) the instructor module, 2) the learner module, and 3) the architect module as part of an Al-model service can be configured to be able to operate on either of a CPU computing device or a GPU computing device or both.

[00067] In an embodiment, other independent processes cooperate together and contain functionality from the instructor module, the learner module, etc. For example, a scholar process is coded to handle both the training for a given concept (lesson management) and training a lesson. The scholar will also select parameters for the concept. The scholar will also select the algorithms and the topology of the graphs for the concept (e.g. does some of the job of the architect module). The scholar process trains a given concept (e.g. does the job of instructor and learner in an alternative architecture). When the Al engine trains the same concept or multiple different concepts in parallel then the Al engine will have multiple scholars running in parallel. A director module manages the training of a concept graph by calling for the instantiation of one scholar process for each concept being trained. A conductor process merely manages resource allocation required for training an Al model. The director module determines how the resources are used to train the graph of nodes in parallel. The director may also instantiate the graph of nodes itself. Each concept is trained by a scholar process and in the case of multiple concepts being trained in parallel multiple scholar processes are run simultaneously. This is all managed by the director module.

Concepts and mental models

[00068] Figures 3A and 3B provide block diagrams of an embodiment of a concept in a mental model 300A, 300B that receives input data from a data source, computes its function, and generates output data.

[00069] Pedagogical programming focuses on codifying two main pillars: 1 ) What are the concepts associated with the problem domain (and mentally how do they relate to each other)? and 2) How would one go about teaching those concepts?

[00070] A concept is something that can be learned. Once learned, its corresponding Al object can provide an intelligent output. An Al object may learn and be trained on a particular concept. An Al object corresponding to a particular concept can receive input data from other Al objects/concepts and simulators, and send output data to other Al objects/concepts or as an Al object corresponding to a complex task produce a final result/output. A concept can be used in isolation, but it is typically more useful to construct some structured relationship of connectivity, such as a hierarchy, between the related concepts, beginning with the relatively simple concepts and then building into more complex concepts. For example, "ball location" is a relatively simple concept; whereas, "get high score" with the ball is a more complex task. In another example, a mental model of flying a plane may have a complex task of "flying a plane" and numerous concepts such as "how to navigate and move a plane from point A to point B," "how to avoid crashing into objects," "how to take off into flight', 'how to land from flight," etc. Each of the concepts feeds one or more outputs either directly or indirectly into the complex task of "flying a plane" when undergoing training on the complex task. The architect module 326 creates the structured relationship of connectivity between these concepts based on user supplied guidance in the pedagogical programming language code. [00071] Thus, concepts are distinct aspects of a complex task that can be trained separately, and then combined using an integrator concept. This approach drastically reduces the overall complexity, since the simpler problems can be trained with focused and easier-to-specify reward functions. In addition, the selected concept can be quickly learned using a simple reward function. Each discrete Al object making up an Al model may be encoded or containerized into its own new concept node and that set of concept nodes is put into a graph of concept nodes. The graph of nodes may be intermixed with concept nodes that are new and extend the functionality of the initial machine-learning model. (See example Figures 4A-4F.)

[00072] A concept in a pedagogical programming language may be something that an Al object can be trained on and learn. In an embodiment, a concept can describe things such as an object, a ball, a character, an enemy, a light, a person, or the like. The state data can be whether the one or more things are on or off, hot or cold, a number or a letter, or the like. Other example concepts can reflect a method or a behavior such as "avoid ghosts," "keep the paddle under the ball," "don't run into walls," "turn lights off," "get high score," or the like. Both Figures 3A and 3B show mental models including the strategy-type concept "get high score."

[00073] A mental model in a pedagogical programming language is also something that an Al model can be trained on and learn. A mental model can include one or more concepts structured in terms of the one or more concepts, and the mental model can further include one or more data transformation streams. As shown in Figure 3A, a single- concept mental model can include, for example, a strategy-type concept such as "get high score." As shown in Figure 3B, a multi-concept mental model can include a hierarchical structure including, for example, strategy-type concepts such as "keep paddle under ball" and "get high score" and fact-type concepts such as "ball location." The concepts of "keep paddle under ball" and "ball location" feed parameters directly or indirectly into the complex task of "get high score" with the ball. Each Al object in a multi-concept mental model can receive input from other Al objects corresponding to other concepts in the mental model, send output to other concepts in the mental model, provide a final output or result output, or a combination thereof. Addition of more concepts to a mental model can decrease training time for an Al object, as well as enable a trained Al object to give smarter, more accurate predictions. Each trained concept may be Al object. Given this choice of mental model frames, the system would then codify the underlying concepts and their relationships in a corresponding network of Al objects.

[00074] Figure 4A illustrates a block diagram of an embodiment of a complex task with multiple smaller, individual sub-tasks making up the complex task, and the individual sub-tasks correspond to its own Al object in the concept network.

[00075] The modules of the Al engines decompose complex tasks into smaller, individual sub-tasks 410. The complex task; and thus, complex learning problem can be broken down into concepts, each concept learned independently, then reassembled into a complete solution to the complex task. The modules of the Al engine can initially break an example overall task of Grasp-n-Stack down into four concepts: 1 ) Reach the object, 2) Grasp the object, 3) Move, and 4) Stack the object in a Stack. In this example robotic control demonstration, the complex task was decomposed by the modules into a concept network of five concepts: - Reach for the object, Grasp the object, Move the object, Stack the object, and the integrated Grasp-n-Stack task. (See Figure 4E for an example illustration) Each concept has its own corresponding Al object being trained.

[00076] The concept network reinforcement learning approach has many benefits, as well as some limitations. Perhaps the greatest benefit is the ability to truly decompose reinforcement learning problems into independent parts. Developers can hierarchically decompose complex tasks into smaller increments. This is crucial for applying reinforcement learning to real industrial problems, allowing teams to divide and conquer: i) different groups can independently work on different aspects of a learning problem, ii) quickly assemble them into a full solution for the complex task, and iii) later upgrade individual components later without needing to retrain the entire set of concepts making up the complex task. The concept network reinforcement learning framework enables true problem decomposition for reinforcement learning problems. A complex learning problem can be broken down into concepts, each concept learned independently, then reassembled into a complete solution. Decomposing problems in this way can greatly reduce the amount of training needed to achieve a useful result.

[00077] In general, Reinforcement Learning (RL) can be about an Al concept interacting with the environment over time, learning an optimal policy, by trial and error with evaluated feedback, for sequential decision making problems. A deep neural network can be combined with reinforcement learning for the deep reinforcement learning. The Al model learns by way of, for example, a dataset, a cost/loss function, and an optimization procedure. A machine learning algorithm can be designed to make gaps between training errors and testing error small. An Al model, such as a neural network, can include input and output layers. At each layer except the initial input layer, the system can compute the input to each unit, as a weighted sum of units from the previous layer. A map of a set of input values to output values can be generated. The system may implement a form of deep reinforcement learning in contrast to a standard notion of running a single end-to-end algorithm training, which saves computing duration to train the Al model compared to the single end-to-end algorithm training. In the deep reinforcement learning, the Al concept interacts with an environment over time. In an embodiment, at each time step (e.g., iteration of learning), the Al concept receives a state in a state space, selects a sub-task from an action space, follows a policy, which controls the Al concept's behavior, i.e., a mapping from a state to sub-tasks, then receives a scalar reward, and then transitions to the next state, according to the environment dynamics, or model, for the reward function. (See Figure 6 for example.) The Al concept also receives feedback from its selected sub-tasks and performance and then evaluates the feedback to alter its training. Each concept can have different state + action spaces.

[00078] In an embodiment, automatic partitioning operates on state space, not action space. The module identifies regions of state space that correspond to separable, individually solvable subtasks and creates distinct policies for each so identified region of state space.

[00079] Referring to Figure 2A, the architect module 326 may instantiate the Al objects corresponding to the concepts of the complex/main task into the graph of i) a concept node corresponding to an integrator and ii) one or more levels of concepts corresponding to the individual sub-tasks that hierarchically stem forth from the integrator in the graph of the Al model. (See Figure 4A, for example.) The integrator concept node integrates an interaction between the individual sub-tasks to achieve the end solution of the complex task. (See Figure 4C, for example.)

[00080] The architect module 326 may be configured to automatically partition the individual sub-tasks into the concept nodes in the Al model to be trained on in a number of ways. The ways of conveying the partitioning of the individual sub-tasks into the concept nodes include but are not limited to: i) how to partition the individual sub-tasks is explicitly defined in scripted code by the user, ii) how to partition the individual sub-tasks is hinted at by giving general guidance in the scripted code by the user, iii) how to partition the individual sub-tasks is interpreted from guidance based on responses from the user to a presented list of questions, and iv) any combination of these three. Also, the user may also explicitly define or just give hints on how many levels of nodes in the graph should be. The architect module 326 then proposes a hierarchical structure for the graph of Al objects making up the Al model. The architect module 326 partitions the individual sub-tasks to separately train within that Al model where it makes sense to efficiently train in parallel with each other.

[00081] In one case, the Al engine figures out how where to partition, by looking at the state inputs and separating by distinctly discreet sets of state inputs. The architect module 326 analyzes an anticipated output for each sub-task and when the state input is roughly generating the same reward function, then the system says okay, do not partition individual sub-tasks into their own concept anymore. The architect module 326 can use artificial intelligence to script i) how to construct and ii) know when concepts need to be divided out and treated as separate concepts. For example, when each sub-task uses a similar reward and is getting the same set of data from the data source, such as a simulator, data generator, database, etc., then those sub-tasks can actually be combined into a single concept node. [00082] The user can supply the reward function for each concept or the system, can use auto scripting to recognize problems and supply what the reward should be for each concept, which is distinct from the overall reward for the entire complex task.

[00083] The instructor module 324 may cause the Al engine to initially train each individual Al object to solve its individual sub-task and it's corresponding one or more reward functions focused for solving that sub-task. The instructor module 324 may then next use an integrator node to train the set of individual sub-tasks in the complex task to combine the concepts of the individual sub-tasks to cooperate and work with each other to achieve the complex task. The concept node of the integrator may then use a reward function focused for the end solution to the complex task.

[00084] Thus, the architect module 326 lays out a deep learning neural network for the Al model. The instructor module 324 will then proceed to execute the best available lesson in the curriculum and will stream data to the set of Al objects being trained. Upon deployment, a data source 219 would be able to stream, for example, an image into the Al model 106 and get predictions out. In an embodiment, the low level Al or machine learning algorithmic details need not be codified by a user but rather these low level details can be generated by the architect module 326 by pulling the topology of a given network processing nodes and a best machine learning algorithmic from reference databases.

[00085] During training, the instructor module 324 cooperating with the learner module 328 might find that an example training algorithm, such as the TRPO algorithm, has difficulty training one or more of the concepts. The modules may choose other algorithms for training these actor concepts to its corresponding Al object. For example, three of these concepts - Orienting, Moving, and Stacking - may use TRPO and deep reinforcement learning to train, while the Pinching and Reaching concepts can be handled with inverse kinematics. Each concept can use the most appropriate approach for that task, whether a classical motion controller, a pre-existing learned model, or a neural network that needs to be trained. [00086] Figure 4E illustrates a diagram of an embodiment of an example Al model being utilized by a robotic arm 400E to carry out individual sub-tasks in the complex task. Stages of the complex task may include (a) Moving to the object, (b) Reaching for the object, (c) Grasping the object, and (d) Stacking the object on a stack of objects.

[00087] Referring back to Figures 4A and 4E, the example challenge is Grasp a block and Stack the block on top of another. (See Figure 4E.) The Al controlled robot must coordinate finger movement as well as differ its block positions and orientations. Dexterity is hard and flexibility to Grasp and Stack in different positions and orientations is a must. The solution is decomposition of the overall task into individual sub-tasks. (See two different example decompositions in Figures 4A and 4B.) The Al engine first trained the Al model to learn the concepts of Grasp and Stack using reinforcement learning. These trainings of the different Al objects corresponding to their concepts can be done independently of each other. In addition, multiple simulations may be occurring at the same time for each concept in order to speed up the training on that specific concept. Once the Grasp and Stack concepts are trained, then all four Al concepts are then trained to learn to work with each other. A meta-controller - (e.g., integrator/selector concept) - then learns to combine the newly trained concepts with an existing Move classical controller and a Reach function into a complete Grasp-n-Stack complex task. The integrator quickly learns to select the correct Al concept for that task. The integrator can also very quickly learn, if need be, to slightly adjust the training of each Al concept to have them work together to achieve the main task. The Al engine's method of assembling the concepts successfully solves the entire complex task, and is, for example, multiple times faster than prior techniques in a similar setting.

[00088] In parallel to the training of the Stack concept, the instructor module and learner module may cooperate to train the Al object corresponding to the Grasp concept. The instructor module and learner module may cooperate to put in the algorithms and curriculum for the Grasp training. Initially, the Al controlled robot is expected to flail and fail. However, over time, the Al controlled robot learns what to do based on the reward the Al engine gives the Al controlled robot (for success). [00089] In parallel to the training of the Grasp concept, the instructor module and learner module may cooperate to train the Al object corresponding to the Stack concept. The instructor module and learner module may cooperate to put in the algorithms and curriculum for the Stack Al concept to train on.

[00090] Note, for designing reward functions within the Grasp and Stack concepts, a concept of orienting the hand for Grasping and/or Stacking the object can be learned. (See difference between Figures 4B and 4A.) Thus, referring to Figure 4B, to further simplify the learning problem, the modules further break the top level concept of Grasp into a lower level of two concepts of: Orienting the hand around object in preparation for Grasping, and Pinching the object. Likewise, the modules further break the top level concept of Stacking into a lower level of two concepts: Orienting the hand around object in preparation for stacking, and Orienting the stack, for a total of eight actor concepts in the concept network.

[00091] Moreover, the Al system may explore creating and training with two or more concept node hierarchies in order to improve the speed and/or accuracy of the resulting Al model. For example, the Al engine may explore by creating and training a single integrator node with four children sub nodes (see Figure 4A), and a multi-level tree with four children sub nodes and two of these with their own integrator and children concept nodes (see Figure 4B), both in parallel at the same time. In this small set of concepts, there may be little benefit to nesting integrators in this way. However, as the size of the tree of nodes scales, the ability to encapsulate sections of the problem and separately learn the correct circumstances in which to invoke concepts to solve that sub problem will improve training parallelization, help keep large concept trees organized, and bound the complexity of the task any single integrator must learn. Thus, the process may vary where the decomposition and combination occurs within a single Al object corresponding to that concept. See, for example, the different level of decomposition between the Grasp concept in Figures 4A and 4B. In Figure 4A, the modules combine individual sub-tasks making up the concept of Grasp. In Figure 4B, the modules separate out the individual sub-tasks making up the concept of Grasp. [00092] Using hierarchical decomposition with deep reinforcement learning, the Al engine platform achieves for example, a robotics control benchmark, with an order of magnitude fewer training cycles. Thus, an enterprise could use hierarchical decomposition process of breaking down the overall task into multiple smaller tasks being trained in parallel rather than using a single end-to-end algorithm. Once each Al object corresponding to a given concept is trained on its corresponding individual task, then all of the trained Al objects can be trained to work with each other to achieve the overall task. This process trains multiple Al concepts in parallel and then combines the trained versions of the concepts to achieve a similar desired result to the one end-to-end algorithm but in a faster amount of time and possibly with better accuracy. For example, a simulated robot or CNC machine may successfully train upon the individual sub-tasks of i) Grasping a block and ii) Stacking the block on top of one another, in parallel to each other and apply deep reinforcement learning algorithms to learn these concepts. Training a system using multiple concepts, for example, the concepts of Reach, Grasp, and Stack, can be trained on individually and in parallel to each other, which requires far less training time and computing cycles. Next, those trained concepts can be trained to work with each other to accomplish the end result of what the single end-to-end algorithm would typically accomplish.

[00093] In addition, differently trained Al objects can be assembled into the Al model in order to decrease an overall training time. Thus, Al objects of the Al model may include a blend of at least a first set and second set of Al objects being trained by the instructor module via reinforcement learning, such as the Grasp, the Stack, and the Orient concepts, and a third set of Al objects that are configured to operate in two ways: 1 ) as control nodes where one or more actions are produced by the code this node and/or 2) this node just implements a data transformation step. This all may occur while a conductor service manages multiple simulations from the data sources in parallel at the same time to train the first and second sets of Al objects with the deep reinforcement learning.

[00094] Individual concepts in a network can be easily replaced with alternate implementations, allowing easy experimentation and incremental improvement - a hard- coded controller can be replaced with a learned one, or an intractable concept can be further subdivided, all without requiring any change in the rest of the concept network. Additionally, the independence among all children of an integrator typically allows them to be trained in parallel.

[00095] In addition, the trained Al objects corresponding to different concepts making up the Al model can be replaced without retraining the overall system. For example, the system could adapt to a Grasp concept with different training and parameters without having to change the Al objects corresponding to the Reach, Move, or overall integrator concepts.

[00096] In an embodiment, because an integrator treats its child concepts as black boxes, each can be implemented using the training technique (i.e., algorithm), most appropriate to the problem. In robotics, for example, the complex task requiring dexterous manipulation like Grasping may be implemented with deep reinforcement learning, while well-characterized tasks like Moving between work spaces can be handled by inverse kinematics. For the same reason, entire concept networks are re-usable and composable: a solution to one problem can be used as a component in a larger problem.

[00097] In addition, hierarchical decomposition makes the Al model more explainable than monolithic training methods by seeing the concepts activated by each integrator gives a higher-level of insight into the behavior than simply seeing the low-level sub-tasks at each time step.

Hierarchical Deep Reinforcement Learning for Flexible Dexterous Manipulation

[00098] In an example use case, the system may use hierarchical decomposition, deep reinforcement learning for flexible dexterous manipulation by a robot (see Figure 4E). The Al engine machine teaching platform enables subject matter experts to teach an Al concept how to solve complex problems. A key feature of the platform is the ability to decompose complex tasks using concept networks— distinct aspects of the main task that can be trained separately and then combined using an integrator component. This approach may be used to solve a complex robotics task requiring dexterous manipulation, for example, using a simulated robot arm to pick up an object and Stack it on another one. The Al engine applied this decompositional approach, improving training efficiency and flexibility.

[00099] Learning is also greatly enhanced from interacting a simulation with feedback from a real world environment. As a result, it is important to understand whether or not a system's operations and interactions with its environment can be simulated, or modeled. A deep reinforcement learning iterative learning process with the Al software can be very effective. Overall, the Al objects corresponding to concepts can learn individual tasks in a simulation/modeling world. Next, the trained Al objects will apply the trained concept in a real world situation. Next, the trained Al objects incorporate learned feedback about working in the real world back into a simulation environment to refine/tune the training of the concept(s). Lastly, the trained Al objects then apply the refined trained concepts in the real world again.

[000100] For example, in each iteration, the machine learning software makes a decision about the next set of parameters for friction compensation and the next set of parameters for motion. These decisions are made by the modules of the Al engine. It is anticipated that the many iterations involved will require that the optimization process be capable of running autonomously. To achieve this, a software layer is utilized to enable the Al engine software to configure the control with the next iteration's parameterization for friction compensation and its parameterization of the axis motion. The goal for deep reinforcement learning in this example user's case is to explore the potential of the Al engine to improve upon manual or current automatic calibration. Specifically, to eliminate the human expert and make the Al the expert in selecting parameter values, equal or improve upon the degree of precision, reduce the number of iterations of tests needed, and hence the overall time needed to complete the circularity test. The Al engine is coded to understand machine dynamics and develop initial model of machine's dynamics. Development of a simulation model is included based on initial measurements. The Al engine's ability to set friction and backlash compensation parameters occurs within the simulation model. After the initial model training occurs, then the training of the simulation model of friction and backlash compensation is extended with the advice from any experts in that field. The training of the simulation model moves from the simulation model world, after the deep reinforcement learning is complete, to a real world environment. The training of the concept takes the learning from the real machine and uses it to improve and tune the simulation model.

[000101 ] Figure 4B illustrates a block diagram of an embodiment of an Al model 400B learning a complex task with multiple hierarchical levels of concept nodes.

[000102] As previously discussed, the complex task is composed of several concepts horizontally across its graph, such as Reaching, Moving, Grasping, and Stacking, that are independent of one another. The top level Grasp-n-Stack concept incorporates an integrator concept. The next level down concepts of the Grasping concept and the Stacking concept each incorporate an integrator concept. The graph vertically has levels. For example, the Grasp concept is made up of the concepts of Orient and Pinch. Likewise, the Stack concept is made up of the concepts of Orient the block and the Stack of blocks orientation.

[000103] For Figure 4B, each learned actor concept has its own reward function, independent of the overall problem. Each Al object corresponding to its concept is trained in order once its precursors in the concept graph have been trained. First, the Al engine trains the concepts of orient and pinch independently and then the combined Grasp concept is trained. Likewise, Stack and Reach concepts are trained. Any of these concepts may have been previously pre-trained and just being reused in this task. The trained concepts are combined with the Move classical controller. Once these concepts are trained, the Al engine trains the top level Grasp and Stack concept. Also, the Al engine using the hierarchal decomposition technique allows existing solutions to sub- problems to be composed into an overall solution without requiring significant re-training, regardless of the algorithms and state space definitions used to solve each sub-problem.

[000104] As discussed earlier, this concept network using reinforcement learning facilitates an industrially applicable approach to solving complex tasks through problem decomposition, simplified reward function design, parallel training quickly and robustly, and producing a policy that can be executed safely and reliably when deployed. [000105] Again Figure 4B shows three integrator nodes, three control concepts, and three classical controllers. The Grasp-n-Stack Al object, the Grasp Al object, and the Stack Al object, each incorporates an integrator node. Both Orient the hand concepts are learned concepts as well as the Orient the stack of blocks concept. The Reach, Move and Pinch concepts may be implemented as classical controllers. Each node also implicitly takes the state as input, and can be paired with input and output transformations.

[000106] Figure 4C illustrates a block diagram of an embodiment of a complex main task and its graph 400C of i) a concept node corresponding to an integrator task, and ii) one or more levels of concepts corresponding to the individual sub-tasks that hierarchically stem forth from the integrator task in the graph of the Al model.

[000107] The integrator/selector/meta controller is configured to select from different sets of concept nodes, such as concept 1 , concept 2, and concept 3, to compute one or more sets of concept nodes that need to be computed based on the data and state from the data source. The Al engine solves each concept, as an individual sub-task, separately; and, the meta-controller/ selector/ integrator component/concept may then also be used to combine them, concept 1 , concept 2, and concept 3, into a complete solution for the complex task.

[000108] There can be three or more types of concepts including: 1 ) actor concepts, which define sub-tasks to take in certain situations, 2) integrator concepts, which choose which one of their concepts will act next, and 3) perceptor concepts, which, transform low-level state input into higher level perceptual features that are more useful for subsequent concepts. The overall concept network is a directed acyclic graph, with the overall system state coming in, perceptor, actor, and integrator concepts in the middle, producing the final action to execute in the environment.

Integrator Concepts

[000109] The figure shows an example structure of an integrator concept. The integrator accepts the state from the environment and chooses one of a set of child concepts, which can be either other integrators or concepts. The chosen child concept's policy then interacts with the environment, receives data and states, and then generates sub-tasks to transition the environment until the child node reaches its terminal condition. Thus, once chosen, a child concept executes until it reaches a terminal condition. At the terminal condition, the integrator again receives the new state, data, and the new value of its own reward function, and makes a new choice. Execution can recursively descend through several integrators in turn before an actor concept is reached. Note, treating skills implemented by child nodes as discrete units for the integrator speeds exploration and avoids unnecessary backtracking (if a new child concept could be selected for each time step, one concept's policy could undo the progress made by another).

[000110] The integrator's dependent nodes in the graph can be treated as a policy- implementing black box. This allows the integrator to be nested: simple skills may be grouped together under an integrator to form a more complex skill, which may in turn be selected by its parent.

[000111] All non-leaf nodes may be integrator nodes. Training proceeds in the graph of nodes from the bottom-up. First, the leaf nodes are trained. Next, inner nodes going up the graph up to the root node are trained. Each integrator node learns how to select which already trained child node to execute as well as an alternative (hidden) delta concept it has inside. The exception to the above is the case where a non-leaf concept node takes as input from a particular Gear node which is only transforming the data. In this case this inner node is not an integrator. In a more general formalism, all leaf-nodes generally are not integrators and inner nodes may or may not be integrators, and the top root node will be an integrator. These integrator nodes behave as a selector selecting amongst children sub concepts and a potentially if one exists an internally synthesized delta network (hidden concept).

[000112] Integrators are trained using a discrete action algorithm. Because the chosen policies are trained separately and treated as a black box that executes a policy to termination, reward functions for integrators can be simple, typically rewarding progress toward an overall goal. If the integrator's task can be solved with a small number of child policy executions, or the right child node to pick is easy to deduce from the state, integrators are also very quick to train.

[000113] The concept node of the integrator task can be trained via reinforcement learning to learn to choose a sub-task recommended from the two or more Al objects in the levels stemming from the integrator node in the graph by choosing a particular sub- task that is considered most applicable based on a current state data. This is a discrete reinforcement learning problem, that the Al engine solves with an example learning algorithm, such as the DQN algorithm, using overall task success as the reward. (Note, any discrete reinforcement learning algorithm could be used.) To make this effective, the Al engine may not choose a new concept at each time step but rather train a specific concept until it reaches a termination condition. The integrator may use concepts with a long-running termination condition: each concept can have pre-conditions for when it can be selected, and a run-until condition to meet before switching to another individual sub- task. This gives the designer an easy way to specify constraints like "don't try to Grasp until you're close to the object", and "once you start to move, continue that for at least 100 time steps".

[000114] In an example, here is how to specify the above using a coding language such as the Inkling™ language: <code snippet, with syntax highlighting >

[000115] Here's what an example scripted code could look like in an example language, Inkling language, for machine teaching:

# ... schema and gear declarations ...

# move isn't learned - it uses a provided controller

external reach is reach_gear

follows input(State)

end

# Grasp will be learned by our system

concept Grasp is estimator

predicts (ManipulatorCommand) # the sub-task follows input(State) # the input

end

# move isn't learned - it uses an provided controller

external move is move_gear

follows input(Command)

end concept Stack is estimator

predicts (ManipulatorCommand)

follows input(State)

end

# This is the integrator, which picks one of the sub-tasks

# from the preceding concepts,

concept pick_and_Stack

predicts (Command)

follows Reach, Grasp, Move, Stack

feeds output

end

# ... curriculum declarations: how to train each learned concept ...

[000116] The graph of concept nodes making up the Al Mental Model may be at least two or more levels deep as well as two or more branches horizontally across. (See Figures 4A, 4B and 4F for example.) Each integrator node can select a first set of concept nodes from two or more sets in the graph of the complex task to be trained on and computed. The integrator can observe a state of a data source generator, such as a simulator, to supply data to train the Al model and then based on i) the state of the data source, ii) all of the concept nodes in the first set using similar supplied data to train on, or iii) both, and then select the first set of concept nodes from the two or more sets of nodes in the graph to receive its input data from the data source generator. For each, based on the state or supplied data, the integrator node can select merely, for example, the Grasp concept this training iteration to be trained.

[000117] A node in the graph takes as inputs the state as well as potentially modified states that come from other concepts producing a transformation of the data. For example, Gears node could provide a transformation. Based on the state and/or modified states being provided, a concept then produces an action. If it is an integrator it then selects between sub-tasks, which one should be executed. The sub-task selected then recursively selects the next from its children sub-tasks.

[000118] Each integrator node in the graph of nodes is configured to evaluate in turn, based on data from the simulator, which of the nodes in the graph are currently appropriate to solving its sub task, and merely those nodes so chosen will be further provided with data from the simulator and evaluated, which saves an amount of computing power and cycles compared to computing all of the nodes making up the Al model each evaluation cycle.

[000119] Thus in this example in Figure 4B, merely the set of the Orient concept, the Pinch Concept, the Grasp concept, and the Grasp-n-Stack concept are being trained and computed. The data source will supply the data to the first set of concept nodes that need to make their individual computations as well as to the integrator, which the integrator checks to see that the computations for all of the needed nodes merely in the first set of nodes occur, which saves an amount of computing power and cycles compared to computing the all of the nodes making up the Al model each training cycle. The data source will supply the data to the concept nodes that need to make their individual computations as well as to the integrator, which can make the computations for that combined set of nodes. Computations for that set of nodes in the routing path will be calculated; as opposed to, calculating the computations for all of the nodes in the entire hierarchy of the graph of nodes.

Actor Concepts [000120] An actor concept, such as the Grasp concept, takes state as input and produces an action, which can be single-step or a multi-step policy. Actor concepts can be learned i) using reinforcement learning, ii) can use a manually coded controller, e.g., using inverse kinematics, or iii) can be implemented using a pre-trained neural network based controller, perhaps re-used from another concept network. A policy can even combine learned behavior on part of the action space and hard-coded behavior on the rest: for example, when learning to orient the gripper in an example robotics task, the Al engine may hard code that the gripper fingers should be open, and let the network learn to control the other arm joints.

[000121] The black-box nature of concepts allows each one to transform the state space and action spaces as appropriate for the individual sub-task. On input, the state space of the individual sub-task is converted to an appropriate form, e.g., by omitting irrelevant elements or augmenting with derived properties. The learning problem is solved with the transformed input and an appropriate action space, and that action space is then transformed back to the individual sub-task's action space.

[000122] Each learned actor concept has its own reward function, independent of the overall problem. Thus, reward shaping considerations are encapsulated within concepts, and must only be defined on the relevant portions of the concept's state and action space. Each learned actor concept can also be trained with the most appropriate learning algorithm for that individual sub-task. This ability to customize the training approach for each sub-problem speeds up an individual sub-task's design and iteration, and can significantly speed up training.

[000123] Once an actor concept is run, it continues to execute its policy until it hits one of its execution terminal conditions. Recall that terminal conditions may be based on state, defining a subset of the state space where the policy is defined. If an integrator chooses an actor concept when the state is in one of the terminal regions, the actor returns a no-op action. During training, the integrator learns that concepts chosen outside their working areas do nothing. Thus, the terminal conditions for each concept also serve as constraints on the activation of its policy, ensuring it can only execute its policy in regions of state space it has explored during training and in which its policy has been well characterized and deemed safe. Without such constraints, the undefined behavior of reinforcement learning-based control policies outside the state space they have explored during training can pose a significant safety hazard when deployed into production. The terminal conditions can be configured differently during training and deployment. This can be used to provide an additional margin for error, or further restrict the work space where execution of a given skill is permitted.

Perceptor Concepts

[000124] Perceptor concepts transform information about the state into a perception that is used as input for concepts further along in the graph. This is typically used to convert environment state into a simpler or higher level form. Typical perceptor concepts are taking visual input and converting it into an object identification, or converting textual data into a topic vector and a sentiment estimate.

[000125] Just like actor concepts, perceptor concepts can be hard-coded, pre- trained, or learned. Typically, a learned perceptor concept would be trained using supervised learning.

[000126] The system can treat perceptor concepts as part of the state pre-processing of individual policies. The reason the system distinguishes them as a separate concept type is to allow a single state transformation to be included as input to many other concepts.

[000127] Figure 4D illustrates a block diagram of an embodiment of a graph 400D of the training of two or more different sub concepts corresponding to the individual sub- tasks in the complex task, in parallel, which the parallel training and simpler reward functions speed up an overall training duration for the complex task on the one or more computing platforms.

[000128] The Al engine using machine teaching provides the abstraction and tooling for developers, data scientists, and subject matter experts to teach domain specific intelligence to a system. Developers codify the specific concepts they want a system to learn, how to teach them, and the training sources required (e.g., simulations, data), using a pedagogical software programming language, such as Inkling™. The system then teaches each individual Al object on learning its particular skill, on its own, which will go faster than trying to train on that skill while additional variables are being thrown into that training curriculum from other concepts.

[000129] Concepts can be broken down into smaller individual sub-tasks and then training occurs specifically for a concept starting at its lowest level of decomposition (i.e., the leaf in a tree structure). For example, looking at the graph 4D and Figure 4A, the "Grasp the object" concept and the "Stack the object" concept sub-tasks are simple tasks for which the Al system uses deep reinforcement learning. The Al engine trains the Grasp concept and Stack concept with reinforcement learning, using, for example, a TRPO algorithm.

[000130] Training the Stack concept, for example, took < 14 million simulator training cycles>, which is equivalent to <139 hours> of simulated robot time. The Grasp concept was, for example, slightly <faster>, taking <125 hours> of simulated robot time.

[000131] Note, looking at the graph 4D and Figure 4A, the "Reach toward the object to pick up" concept and "Move toward target block" concept sub-tasks are simple motions for which the Al system uses a classical motion controller. The classical motion controllers are already fully "trained" and need no additional training to perform that function. The Al engine allows users to integrate such classical controllers mixed with other concepts trained with reinforcement learning through, for example "Gears™" functionality. In the graph, then an integrator concept took, for example, 6000 training cycles to integrate the four concepts of i) Grasp, ii) Stack, iii) Reach, and iv) Move to achieve the proper success and reward function of the Grasp-n-Stack concept of the overall complex task.

[000132] Looking at the graph, learning the Stack concept dominated training time for the Al system at about 14 Million samples, corresponding to 139 hours of training time. However, the Grasp concept may be trained in parallel by another simulator or data source. The Move and Reach concepts did not require any training, and the integrator trains very quickly ~ orders of magnitude faster than the guided-retraining approach used in an example prior technique. Thus, the Al engine platform achieves a control benchmark, with an order of magnitude fewer training cycles. By scaling and training on multiple computing platforms, the duration for the training is decreased to substantially the time needed for learning the Stack concept and then integrating the four concepts.

[000133] The Al engine splits the problem into simpler sub problems, which allows decomposing a reward function in multiple simpler reward functions; ensures consistency between Grasp orientation and the orientation needed for Stacking and assembles the Al concepts into the Al model.

[000134] In this example, a dexterous robotic is trained using hierarchical decomposition, deep Reinforcement Learning that blends reinforcement learning and classical control while running and managing multiple simulations in parallel for the reinforcement learning.

[000135] Each concept can have different state + action spaces. Typically, these state + action spaces can be smaller than a globally-applicable state/action space, which makes the problem easier and learning faster. Since the sub-concepts are much simpler, their goals can be defined on subsets of state space, significantly constraining the necessary exploration and leading to data-efficient learning even in complex environments. The Al engine can mix neural and classical controllers in the same task. The Al engine can enable hierarchical decomposition— a single concept can itself be an integrator choosing among subcomponents. The Al engine can use this to split Grasp-n- Stack concept into four sub concepts of i) Grasp, ii) Stack, iii) Reach, and iv) Move. Each of these concepts, such as the Grasp and Stack concepts, can be trained in parallel.

Multiple managed simulations occurring at the same time to decrease an overall training time.

[000136] Multiple managed simulations occurring at the same time to train multiple Al concepts, improves the system's capability to extract and optimize knowledge faster from large and complex simulations and data, makes users using the system more productive, and decreases the duration of training to accomplish a complex task. Each concept, such as Grasp, may be trained in a parallel with another concept. In addition, already trained concepts, such as Reach, may be incorporated into the Al model. In addition, multiple versions of a particular concept may be trained in parallel with each other.

[000137] The goals of multiple managed simulations with the same Al engine is to: Enable multiple managed simulations running in one instance in the cloud (public cloud, virtual private cloud, private cloud (including an on-premises installation of the Al engine)) to train that concept.

Enable multiple simulations running on one computer (offline) to train that concept.

Scale the training performance linearly (or nearly linear) with the number of simulators.

Alternatively, enable multiple managed simulations running in multiple instances in the cloud (public cloud, virtual private cloud, private cloud) or on premises to train that concept.

Enable multiple simulations running on multiple instances in the cloud (public, VPC, private cloud) to train multiple concepts.

Time to train concept with 1 simulation

Performance = Time to train concept with N amount simulations in parallel

[000138] For Al engine's purposes, the engine will consider performance to be efficient when it's close to or equal number of simulations being used to train the concept.

[000139] With that being said, it is important to note that increasing the number of simulations running at the same time doesn't, at all times, increase the performance. There are bound to run into potential cases, where depending on the problem size, the performance, at times, may remain constant or decrease with the increase in number of simulations. In these instances, it's best to stick to the point before the system starts getting diminishing returns.

[000140] Problems the Al engine overcomes include but are not limited to:

1 ) An Al team spends too much of time (~XXX hours) waiting on the training of complex task to complete, especially when they run experiments in the 'exploration branch'. 2) A user can run into problem(s) making it a difficult and slow process to optimize their simulation, at scale, running a single simulation to train on all of the concepts in the complex task at the same time.

3) A user's may not be able to use an optimized, relatively simple simulation and small datasets, focus on that 'singular' task.

[000141] The Al system allows: i) optimizing multiple control systems at once; and ii) supporting users looking to utilize big data sets for simulating multiple tasks and training multiple concepts at once with the big data sets.

[000142] The Al engine using hierarchical decomposition, deep Reinforcement Learning may process high volume, high velocity, and high variety information (from data sources such as datasets and simulations) faster to enable enhanced decision making, surface insights, and optimize processes faster by utilizing multiple simulations to train concepts on the following environments:

Public cloud (e.g., WS, Azure);

Virtual private cloud (VPC) in a public cloud (e.g., Private clouds in AWS and Azure); and

Private cloud (including an On-premises installation).

[000143] Figure 5 illustrates a block diagram of an embodiment of a user interface 520 for a simulator training one or more concept nodes using reinforcement learning to learn to choose a sub-task recommended from the two or more Al object in the levels stemming from the integrator. The system may implement a form of deep reinforcement learning in contrast to a standard notion of running a single end-to-end algorithm training. In the deep learning, the concept nodes of the Al model being trained interact with an environment over time. In an embodiment, at each time step, the concept node receives a state in a state space, and selects an action from an action space, and follows a policy, which controls the concept node's behavior, i.e., a mapping from a state to actions, then receives a scalar reward, and transitions to the next state, according to the environment dynamics, or model, for the reward function. The concept node also receives feedback from its selected actions and performance and then evaluates the feedback to alter its training.

[000144] Figure 6 illustrates a block diagram of an embodiment of the Al engine 600 using simpler reward functions focused for solving each individual sub-task.

[000145] A concept interacts with reinforcement learning with an environment Έ' in discrete time steps. At each time step in the training, the concept observes a state, performs an action, transitions to a new state, and receives feedback reward from environment Έ', such a robotic arm successfully stacking a prism on a stack.

[000146] An example reinforcement learning problem is where a concept 1 interacts with the environment Έ' in discrete time steps. At each time step Ύ , the agent observes a state 'st 2 Rn' , performs an action at '2 Rn', transitions to a new state 'st+1 2 Rn', and receives feedback reward 'rt 2 R' from environment Έ.' The goal of reinforcement learning is to optimize the agent's action-selecting policy such that it achieves maximum expected return of the feedback reward 'rt 2 R' potentially averaged over moving window of 'X' amount of time steps/ training cycles.

[000147] The Al engine solves complex tasks using reinforcement learning to facilitate problem decomposition, simplify reward function design, train quickly and robustly, and produce a policy that can be executed safely and reliably when the resulting trained Al concept is deployed. The state vector provided to the Al concept can vary from Al concept to Al concept, as may the action space.

[000148] In an embodiment, a learned actor concept's reward function could be defined in terms of the concept's transformed state, and may not be not visible to the rest of the concept network. An Al concept can include both state and action transformations. The reward function and terminal conditions for a state can be written in terms of the concept's transformed state, and are independent of the rest of the concept network. [000149] Figure 4F illustrates a block diagram of an embodiment of the Al engine that solves the example "Grasp and Stack" complex task 400F with concept network reinforcement learning. In this example, the Al engine solves the example complex task of Grasping a rectangular prism and precisely Stacking it on top of a cube. The Al engine initially broke the overall task down into four concepts: 1 ) Reaching the working area (staging 1 ), 2) Grasping the prism, 3) Moving to the second working area (staging 2), and 4) Stacking the prism on top of the cube. The Grasp concept can further be decomposed into an Orient the hand concept and Lift concept. Thus, to simplify the learning problem by using a single policy for each individual sub-task the concept of Grasp, the Al engine broke the Grasping concept into two more concepts: Orienting the hand around the prism in preparation for grasping, as well as clasping the prism to Lift the prism, for a total of five actor concepts in the concept network. Three of these concepts - Orienting, Lifting, and Stacking used the TRPO algorithm to train, while the Reach concept (Staging-1 ) and the Moving concept to the working area (Staging-2) were handled with inverse kinematics.

[000150] Again, the state vector provided to the Al concept can vary from Al concept to Al concept, as may the action space. In this example complex task, all sub-tasks correspond to target velocities for one of nine associated joints. The example action and state vectors are described in Table 1 and 2, while the rewards and terminals are described in the appendices A and B, respectively.

State Spaces, Action Spaces, and Rewards

Table 1 : State vectors for each of the three concepts plus selectors.

[000151 ] Orient Concept - State:

For the six joints excluding the fingers:

1 . The sine and cosine of the angles of the joints.

2. The angular velocities of the joints.

3. The position and quaternion orientation of the prism. 4. The orientation of the line between two of the opposed fingers in degrees normalized by 90_.

5. The Euclidean distance between the pinch point of the opposed fingers and a point 1 :5 cm above the prism.

6. The vector between the same two points above. [000152] Lift Concept - State:

For all nine joints:

1 . The sine and cosine of the angles of the joints.

2. The angular velocities of the joints.

3. The Euclidean distance between the pinch point of the opposed fingers and the center of the prism.

4. The vector between the same two points above.

5. The XY vector between the center of the prism and the starting position of the prism.

[000153] Stack Concept - State:

For the six joints excluding the fingers:

1 . The sine and cosine of the angles of the joints.

2. The angular velocities of the joints.

3. The position and quaternion orientation of the prism.

4. The position and quaternion orientation of the cube.

5. The Euclidean distance between the bottom of the prism and top of the cube. 6. The vector between the same two points above.

[000154] Selectors Concept - State

1 . The position of the pinch point between the opposed fingers.

2. The Euclidean distance between the pinch point and the prism.

3. The position of the center of mass of the prism.

4. The position of the center of mass of the cube.

5. The Euclidean distance between the bottom of the prism and the top of the cube. Table 2: Action vectors for each of the three concepts.

[000155] Concept - Orient: sub-task - Target angular velocities for the 6 joints not including the fingers. Fingers extend maximally.

[000156] Concept - Lift: sub-task - Target angular velocities for the upper arm (1 st, 2nd, and 3rd joints), and opposed fingers (7th, and 9th joints). Remaining finger receives no command.

[000157] Concept - Stack: sub-task - Target angular velocities for the 6 joints not including the fingers. The fingers close with moderate force.

Reward function

[000158] As discussed previously, the full control concept is decomposed into separate "Grasp" and "Stack" concepts (skills), while Grasp itself is decomposed into "Orient" and "Lift". The example shaping reward functions used for the training of each concept are listed below.

[000159] Orient - Reward function:

where ΓΘ is the angular component of the shaping reward for Stack and orient, θχ, θγ, and θζ are the angle between the line passing through the two opposed fingers and the x, y, or z axes in the reference frame of the target object, respectively, and α controls the sharpness of the shaping. Since the objects are symmetrical in x and y, we allow any of the four orientations of the fingers that line up with the x or y axes by only looking at the smallest angular distance to either the x or y axis, yielding a distance that ranges from 0 to 45 degrees. The z should uniquely line up with the object, and ranges from 0 to 90. Here, we used an α value of 0.4.

where rd is the shaping reward for the reaching toward the goal location, d is the distance between the pinch point of the opposed fingers and the goal location, dmax is the terminal distance for this task, and α controls the sharpness of the shaping.

where γΐ is a time decay factor applied to the reward to encourage fast completion, t is the current time step within the episode, and tmax is the time step limit for an episode,

where Rorient is the final reward for the "Orient" concept, bOrient is the bonus awarded on successful completion of the orient task, dOrient is the distance between the pinch point and prism, 6Orient is the angle between the line connecting the opposed fingers and one of the axes of the prism, and the ε values for each are their tolerances.

[000160] Lift - Reward function: rp = 1 - (p/pmax) a where rp is the pinch shaping reward component for closing the fingers, p is the distance between the two opposed fingers, and pmax is the maximum possible distance between the fingers.

where rh is the height shaping component for lifting the prism, h is the height of the prism off the ground, and hmax is the distance at which we declare the prism lifted and terminate the episode. Here we used an α of 4.

where R lift is the final reward for the "Lift" concept, b-lift is the bonus reward assigned for successfully lifting the prism above the threshold height, h is the height of the prism, Gh is the threshold height, and ερ is the threshold distance between the fingers below which they are considered pinched. The bonus rewards for success are greater than the total reward that could have been accumulated had the agent remained in this highest reward state for the remaining time in the episode, to encourage fast completion of the task.

[000161] Grasp - Reward function: Here, RGrasp is the final shaping reward for the "Grasp" concept.

[000162] Stack - Reward function

where dStack is the distance between the pinch point and the goal, and are the thresholds for success in angular and euclidean distance, and wd are the weights

assigned to those reward components, and bStack is the bonus reward assigned for successful completion of the Stack task.

[000163] Full complex Grasp-n-Stack Task - Reward function:

[000164] Independent training of concepts allows each to use a focused reward function, simplifying training. For individual sub-tasks where solutions already exist, such as moving a robotic arm from place to place, they can be seamlessly plugged in among other learned concepts. Similarly, individual trained Al concepts can be reused as components in other tasks, or replaced with improved versions.

[000165] Concept network reinforcement learning is suitable for industrial applications, allowing for flexible goal specifications and rapid development of new variants of a problem. Compared with training monolithic networks to solve complete tasks, concept network reinforcement learning greatly accelerates the speed with which new combinations of functionality can be trained and built upon. Concept network reinforcement learning also has deployment-time benefits: the training process for concepts naturally produces policies with well-defined validity regions, so they can be executed safely and reliably. It also provides improved explainability: by tracking which concepts are activated when generating behavior, the system can provide context for why decisions were made.

[000166] In the example, the Al engine uses concept network reinforcement learning on a complex robotics task requiring dexterous manipulation - Grasping a prism and precisely Stacking it on a cube. The Al engine successfully solves the task, incorporating several inverse-kinematics-based classical controllers as well as a hierarchically decomposed set of learned concepts. The Al engine assembles sub-concepts into the overall solution extremely fast, taking, for example, 45x fewer samples than a state-of- the-art approach on the same task from Popov, I., Heess, N., Lillicrap, T., Hafner, R., Barth-Maron, G., Vecerik, M., Lampe, T., Tassa, Y., Erez, T., Riedmiller, M., "Data- efficient deep reinforcement learning for dexterous manipulation." arXiv preprint arXiv:1704.03073, [2017].

Terminal Conditions: [000167] A goal of a termination condition for a policy can be defined based on the state space. Orient for the orient concept - an episode would end early if the hand moved too far from the prism, if the prism tipped more than 15 degrees, or the goal was achieved by aligning the opposed fingers with the prism while the pinch point was 1 .5cm above the prism.

[000168] Lift: For the lift concept - an episode would end early if the prism was moved outside a virtual cylinder centered on the starting position of the prism, if the hand moved more than a certain distance from the prism, or if the goal was achieved by lifting the prism above a target height.

[000169] Stack: For the Stack concept - an episode would end early if the prism moved too far from the cube, if the prism touched the ground, or if the goal was achieved by lining the prism up with the cube and bringing them into contact.

[000170] Figure 7 illustrates a graph 1000 of an embodiment of the training of the individual sub-task of Orient from Figure 4F and its reward function focused for that sub- task. Figure 8 illustrates a graph 1 100 of an embodiment of the training of the individual sub-task of Lift from Figure 4F and its reward function focused for that sub-task.

[000171] The graphs show the concept's training convergence, either the Orient concept or the Lift concept, with a mean episode reward plotted against training samples in the millions. The shaded area represents the min to max. The shaded area is a 95% confidence interval for the mean. For the Lift concept, tight terminal conditions are set to encourage precise vertical lift, which makes finding an effective policy more challenging. The Orient and Stack concepts trained in approximately 2-3 million samples using shaping rewards and guiding terminals, without the need for hyper parameter tuning. The training graphs using reinforcement learning with the TRPO concepts are presented in Figures 7-9. Note, a very tight terminal constraint on the distance the prism can move from its starting XY coordinates, is designed to encourage a straight vertical lift, and also increased the number of samples required to find a good policy through exploration. Better designed terminal conditions and rewards might speed up training on the concepts. [000172] Figure 9 illustrates a graph 1200 of an embodiment of the training of the interactions of the individual sub-tasks to achieve the complex task of Grasp-n-Stack and its reward function focused for that sub-task.

[000173] In an example, the full concept integrator trained in 22,000 samples (Figure 9), though the integrator itself only saw 6,000 samples as it does not receive state transitions during long running execution of children. When concepts are compatible— i.e., a concept ends within the operating constraints of another— and there exists some chain of compatible concepts that will achieve a goal, the integrator can learn to order these concepts very quickly, without the need to train a monolithic network to subsume the components. Models converged on good solutions between 16000 and 25000 samples. The task of ordering the concepts can be learned nearly two orders of magnitude faster than the individual concepts, or 45x faster than the single policy trained by Popov et al. [2017] using one million samples and previously trained sub-concepts.

[000174] Note, training performance for DQN was evaluated with ten testing episodes for every 50 training episodes, with mean performance in each testing pass plotted in the integrator performance graphs shown in Figures 7-9. Training performance for TRPO uses the raw training episode returns, which are less representative of true policy performance but served well enough to show when the policy had converged. In plots showing the performance of DQN, the X axis represents transitions sampled so far, and the Y axis represents mean episode reward. Final evaluation of robustness for both DQN and TRPO was done without exploration.

[000175] In an embodiment, in 500 episodes we observed no task failures during execution, both with the concepts executed individually in their own environments and the tree with integrators solving the full task. The concept network is able to very reliably Grasp an object and precisely Stack it on another, both with varying position and orientation.

More Architecture details [000176] Referring back to Figures 2A and 2B, the system may further include as follows.

Architect module

[000177] The architect module 326 is the component of the system responsible for proposing and optimizing learning topologies (e.g., neural networks), based on mental models.

[000178] The architect module 326 can take the codified mental model and pedagogy and then propose a set of candidate low-level learning algorithms, topologies of a complex tasks and sub-concepts, and configurations thereof the architect module 326 believes will best be able to learn the concepts in the model. This is akin to the work that a data scientist does in the toolkit approach, or that the search system automates in the approach with statistical data analysis tools. Here, it is guided by the pedagogical program instead of being a broad search. The architect module 326 can employ a variety of techniques to identify such models. The architect module 326 can generate a directed graph of nodes. The architect module 326 can break down the problem to be solved into smaller tasks/concepts all factoring into the more complex main problem trying to be solved based on the software code and/or data in the defined fields of the user interface supplied from the user/client device. The architect module 326 can instantiate a complex task and layers of sub-concepts feeding into the complex task. The architect module 326 can generate each concept including the sub-concepts with a tap that stores the output action/decision and the reason why that node reached that resultant output (e.g., what parameters dominated the decision and/or other factors that caused the node to reach that resultant output). This stored output of resultant output and the reasons why the node reached that resultant output can be stored in the trained intelligence model. The tap created in each instantiated node provides explainability on how a trained intelligence model produces its resultant output for a set of data input. The architect module 326 can reference a database of algorithms to use as well as a database of network topologies to utilize. The architect module 326 can reference a table or database of best suggested topology arrangements including how many layers of levels in a topology graph for a given problem, if available. The architect module 326 also has logic to reference similar problems solved by comparing signatures. If the signatures are close enough, the architect module 326 can try the topology used to optimally solve a problem stored in an archive database with a similar signature. The architect module 326 can also instantiate multiple topology arrangements all to be tested and simulated in parallel to see which topology comes away with optimal results. The optimal results can be based on factors such as performance time, accuracy, computing resources needed to complete the training simulations, etc.

[000179] In an embodiment, for example, the architect module 326 can be configured to propose a number of neural networks and heuristically pick an appropriate learning algorithm from a number of machine learning algorithms in one or more databases for each of the number of neural networks. Instances of the learner module 328 and the instructor module 324 can be configured to train the number of neural networks in parallel. The number of neural networks can be trained in one or more training cycles with the training data from one or more training data sources. The Al engine can subsequently instantiate a number of trained Al models based on the concepts learned by the number of neural networks in the one or more training cycles, and then identify a best trained Al model (e.g., by means of optimal results based on factors such as performance time, accuracy, etc.), among the number of trained Al models.

[000180] The user can assist in building the topology of the nodes by setting dependencies for particular nodes. The architect module 326 can generate and instantiate neural network topologies for all of the concepts needed to solve the problem in a distinct two-step process. The architect module 326 can generate a description of the network concepts. The architect module 326 can also take the description and instantiate one or more topological shapes, layers, or other graphical arrangements to solve the problem description. The architect module 326 can select topology algorithms to use based on factors such as whether the type of output the current problem has either 1 ) an estimation output or 2) a discrete output and then factors in other parameters such as performance time to complete the algorithm, accuracy, computing resources needed to complete the training simulations, originality, amount of attributes, etc. Instructor module

[000181] The instructor module 324 is a component of the system responsible for carrying out a training plan codified in the pedagogical programming language. Training can include teaching a network of intelligent processing nodes to get one or more outcomes, for example, on a simulator. To do so, the instructor module 324 can form internal representations about the system's mastery level of each concept, and adapt the execution plan based on actual performance during training. The directed graph of lessons can be utilized by the instructor module 324 to determine an execution plan for training (e.g., which lessons should be taught in which order). The training can involve using a specific set of concepts, a curriculum, and lessons, which can be described in the pedagogical programming language file.

[000182] The instructor module 324 can train easier-to-understand tasks earlier than tasks that are more complex. Thus, the instructor module 324 can train sub-concept Al objects and then higher-level Al objects. The instructor module 324 can train sub-concept Al objects that are dependent on other nodes after those other Al objects are trained. However, multiple nodes in a graph may be trained in parallel. The instructor module 324 can run simulations on the Al objects with input data including statistics and feedback on results from the Al object being trained from the learner module 328. The learner module 328 and instructor module 324 can work with a simulator or other data source to iteratively train an Al object with different data inputs. The instructor module 324 can reference a knowledge base of how to train an Al object efficiently by different ways of flowing data to one or more Al objects in the topology graph in parallel, or, if dependencies exist, the instructor module 324 can train serially with some portions of lessons taking place only after earlier dependencies have been satisfied. The instructor module 324 can reference the dependencies in the topology graph, which the dependencies can come from a user specifying the dependencies and/or how the arrangement of Al objects in the topology was instantiated. The instructor module 324 can supply data flows from the data source such as a simulator in parallel to multiple Al objects at the same time where computing resources and a dependency check allows the parallel training. [000183] The instructor module 324 may flow data to train Al objects from many data sources including, but not limited to a simulator, a batch data source, a random-data generator, and historical/guided performance form from past performance. A simulator can give data and get feedback from the instructor module 324 during the simulation that can create an iterative reactive loop from data inputs and data outputs from the Al objects. A batch data source can supply batched data from a database in at least one example. A random-data generator can generate random data based on user-input parameters.

[000184] When starting a training operation, the instructor module 324 first generates an execution plan. This is the ordering it intends to use when teaching the concepts, and for each concept which lessons it intends to teach in what order. While the execution plan is executing, the instructor module 324 may jump back and forth between concepts and lessons to optimize the learning rate. By not training each concept fully before starting to train dependent concepts, the system naturally avoids certain systemic machine learning problems such as overfitting. The major techniques used to determine when to switch between lessons and concepts for training are reinforcement learning and adaptive learning. For example, for a first main problem of determining an amount of bankruptcy's in the United States, a first Al object corresponding to a sub concept may be trained in a first lesson on how to determine bankruptcy filings in California. A second lesson may train the first Al object next on how to determine bankruptcy filings in California and York. Successive lessons on an Al object can build upon and augment earlier lessons that the Al object was trained on.

[000185] The instructor module 324 looks to reuse similar training flows that have solved similar problems with similar signatures in the past.

Learner module

[000186] The learner module 328 is a component of the system configured to carry out the actual execution of the low-level, underlying Al algorithms. In training mode, the learner module 328 can instantiate a system conforming to what was proposed by the architect module 326, interface with the instructor module 324 to carry out the computation and assess performance, and then execute the learning algorithm itself. The learner module 328 can instantiate and execute an instance of the already trained system. Eventually, the learner module 328 writes out network states for each trained sub-AI object and then a combination of the topological graph of the main node with all of the sub-nodes into a trained Al model. The learner module 328 can also write the stored output of each node and why that node arrived at that output into the trained Al model, which gives explainability as to how and why the Al proposes a solution or arrives at an outcome.

Hyperlearner module

[000187] The hyperlearner module 325 can perform a comparison of a current problem to a previous problem in one or more databases. The hyperlearner module 325 can reference archived, previously built and trained intelligence models to help guide the instructor module 324 to train the current model of nodes. The hyperlearner module 325 can parse an archive database of trained intelligence models, known past similar problems and proposed solutions, and other sources. The hyperlearner module 325 can compare previous solutions similar to the solutions needed in a current problem as well as compare previous problems similar to the current problem to suggest potential optimal neural network topologies and training lessons and training methodologies.

Simulator

[000188] When, the curriculum trains using a simulation or procedural generation, then the data for a lesson is not data to be passed to the learning system, but the data is to be passed to the simulator. The simulator can use this data to configure itself, and the simulator can subsequently produce a piece of data for the learning system to use for training. This separation permits a proper separation of concerns. The simulator is the method of instruction, and the lesson provides a way to tune that method of instruction, which makes it more or less difficult depending on the current level of mastery exhibited by the learning system. A simulation can run on a client machine and stream data to the Al engine for training. In such an embodiment, the client machine needs to remain connected to the Al engine while the Al model is training. However, if the client machine is disconnected from the server of the Al engine, it can automatically pick up where it left off when it is reconnected. Note, if the system trains using data, then the data is optionally filtered/augmented in the lessons before being passed to the learning system.

[000189] Note, 1 ) simulations and procedural generation are a good choice versus data in a variety of circumstances; and 2) concepts are a good choice versus streams when you can more easily teach versus calculate.

Training mode

[000190] A machine learning algorithm may have of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using this set of variables, the Al engine generates a function that maps inputs to desired outputs. The coefficients and weights plugged into the equations in the various learning algorithms are then updated after each epoch/ pass of training session until a best set of coefficients and weights are determined for this particular concept. The training process continues until the model achieves a desired level of accuracy on the training data.

[000191] When in training mode the architect module 326 of the Al engine is configured to i) instantiate the network of processing nodes in any layers of hierarchy conforming to concepts of the problem being solved proposed by the user and ii) then the learner module 328 and instructor module 324 train the network of processing nodes in that Al model. To effect the foregoing, the Al engine can take compiled pedagogical programming language code and generate an Al-model learning topology, and proceed to follow the curricula to teach the concepts as specified. Depending on the model, training can potentially take substantial amounts of time. Consequently, the Al engine can provide interactive context on the status of training including, for example, showing which nodes are actively being trained, the current belief about each node's mastery of its associated concept, overall and fine-grained accuracy and performance, the current training execution plan, and/or an estimate of completion time. As such, in an embodiment, the Al engine can be configured to provide one or more training status updates on training a neural network selected from i) an estimation of a proportion of a training plan completed for the neural network, ii) an estimation of a completion time for completing the training plan, iii) the one or more concepts upon which the neural network is actively training, iv) mastery of the neural network on learning the one or more concepts, v) fine-grained accuracy and performance of the neural network on learning the one or more concepts, and vi) overall accuracy and performance of the neural network on learning one or more mental models.

[000192] Because the process of building pedagogical programs is iterative, the Al engine in training mode can also provide incremental training. That is to say, if the pedagogical programming language code is altered with respect to a concept that comes after other concepts that have already been trained, those antecedent concepts do not need to be retrained.

[000193] Additionally, in training mode, the user is able to specify what constitutes satisfactory training should the program itself permit indefinite training.

Algorithm selection

[000194] A first step an Al engine can take is to pick an appropriate learning algorithm to train a mental model. This is a notable step in training Al, and it is a step those without Al expertise cannot perform without expert guidance. The Al engine can have knowledge of many of the available learning algorithms, as well as a set of heuristics for picking an appropriate algorithm including an initial configuration to train from.

[000195] The process of picking an appropriate algorithm, etc., can be performed by an Al model that has been trained (and will continue to be trained) by the Al engine, meaning the Al model will get better at building Al models each time a new one is built. A trained Al model, thereby, provides enabling Al for proposing neural networks from assembly code and picking appropriate learning algorithms from a number of machine learning algorithms in one or more databases for training the neural networks. The Al engine can be configured to continuously train the trained Al-engine neural network in providing the enabling Al for proposing the neural networks and picking the appropriate learning algorithms thereby getting better at building Al models. [000196] The architect module 326 can also use heuristics, mental model signatures, statistical distribution inference, and Meta-learning in topology and algorithm selection.

[000197] First, the Al engine and the architect module 326 thereof can be configured to heuristically pick an appropriate learning algorithm from a number of machine learning algorithms in one or more databases for training the neural network proposed by the architect module 326. Many heuristics regarding the mental model can be used to inform what types of Al and machine learning algorithms can be used. For example, the data types used have a large influence. For this reason, the pedagogical programming language contains rich native data types in addition to the basic data types. If the architect module 326 sees, for example, that an image is being used, a convolutional deep learning neural network architecture might be appropriate. If the architect module 326 sees data that is temporal in nature (e.g., audio data, sequence data, etc.), then a recursive deep- learning neural network architecture like a long short-term memory ("LSTM") network might be more appropriate. The collection of heuristics can be generated by data science and machine learning/AI experts who work on the architect module 326 codebase, and who attempt to capture the heuristics that they themselves use in practice.

[000198] In addition to looking at the mental model, the architect module 326 can also consider the pedagogy provided in the pedagogical programming language code. It can, for example, look at the statistical distribution of any data sets being used; and, in the case of simulators, it can ask the simulator to generate substantial amounts of data so as to determine the statistics of data that will be used during training. These distribution properties can further inform the heuristics used.

Meta-learning

[000199] Meta-learning is an advanced technique used by the architect module 326. It is, as the name implies, learning about learning. What this means is that as the architect module 326 can generate candidate algorithm choices and topologies for training, it can record this data along with the signature for the model and the resultant system performance. This data set can then be used in its own learning system. Thus, the architect module 326, by virtue of proposing, exploring, and optimizing learning models, can observe what works and what does not, and use that to learn what models it should try in the future when it sees similar signatures.

[000200] To effect meta-learning, the Al engine can include a meta-learning module configured to keep a record such as a meta-learning record in one or more databases. The record can include i) the source code processed by the Al engine, ii) mental models of the source code and/or signatures thereof, iii) the training data used for training the neural networks, iv) the trained Al models, v) how quickly the trained Al models were trained to a sufficient level of accuracy, and vi) how accurate the trained Al models became in making predictions on the training data.

[000201 ] For advanced users, low-level details of a learning topology can be explicitly specified completely or in part. The architect module 326 can treat any such pinning of parameters as an override on its default behavior. In this way, specific algorithms can be provided, or a generated model can be pinned for manual refinement.

Guiding training

[000202] The first step the Al engine will take is to pick an appropriate learning algorithm to train the Mental Model. This is a critical step in training Al. The Al engine has knowledge of many of the available learning algorithms and has a set of heuristics for picking an appropriate algorithm as well as an initial configuration to train from.

[000203] Once an algorithm is chosen, the Al engine will proceed with training the Al model's Mental Model via the Curricula. The Al engine manages all of the data streaming, data storage, efficient allocation of hardware resources, choosing when to train each concept, how much (or little) to train a concept given its relevance within the Mental Model (i.e., dealing with the common problems of overfitting and underfitting), and generally is responsible for producing a trained Al model based on the given Mental Model and Curricula. As is the case with picking an appropriate learning algorithm, guiding training- notably avoiding overfitting and underfitting-to produce an accurate Al solution is a task that requires knowledge and experience in training Als. The Al engine has an encoded set of heuristics manage this without user involvement. Similarly, the process of guiding training is also a trained Al model that will only get smarter with each trained Al model it trains. The Al engine is thus configured to make determinations regarding i) when to train the Al model on each of the one or more concepts and ii) how extensively to train the Al model on each of the one or more concepts. Such determinations can be based on the relevance of each of one or more concepts in one or more predictions of a trained Al model based upon training data.

[000204] The Al engine can also determine when to train each concept, how much (or little) to train each concept based on its relevance, and, ultimately, produce a trained Al model. Furthermore, the Al engine can utilize meta-learning. In meta-learning, the Al engine keeps a record of each program it has seen, the data it used for training, and the generated Als that it made. It also records how fast those Als trained and how accurate they became. The Al engine learns over that dataset.

[000205] Note, when training of an Al object occurs, the hyper learner module 328 can be configured to save into the Al database 341 two versions of an Al object. A first version of an Al object is a collapsed tensile flow representation of the Al object. A second version of an Al object is the representation left in its nominal non-collapsed state. When the search engine retrieves the Al object in its nominal non-collapsed state, then another programmer desiring to reuse the Al object will be able to obtain outputs from the non- collapsed graph of nodes with all of its rich meta-data rather and then a collapsed concept with a single discrete output. The state of the Al data objects can be in a non-collapsed state so the trained Al object has its full rich data set, which then may be used by the user for reuse, reconfigured, or recomposed into a subsequent trained Al model.

[000206] The database management system also indexes and tracks different Al objects with an indication of what version is this Al object. Later versions of an Al object may be better trained for particular task but earlier versions of the Al object maybe more generally trained; and thus, reusable for wider range of related tasks, to then be further trained for that specific task.

[000207] The Al database 341 and other components in the Al engine cooperate to allow migrations of learned state to reconfigure a trained Al object. When a system has undergone substantial training achieving a learned state, and a subsequent change to the underlying mental models might necessitate retraining, it could be desirable to migrate the learned state rather than starting training from scratch. The Al engine can be configured to afford transitioning capabilities such that previously learned high dimensional representations can be migrated to appropriate, new, high dimensional representations. This can be achieved in a neural network by, for example, expanding the width of an input layer to account for alterations with zero-weight connections to downstream layers. The system can then artificially diminish the weights on connections from the input that are to be pruned until they hit zero and can then be fully pruned.

Deploy and use

[000208] Once a trained Al model has been sufficiently trained, it can be deployed such that it can be used in a production application. The interface for using a deployed trained Al model is simple: the user submits data (of the same type as the trained Al model was trained with) to a trained Al model-server API and receives the trained Al model's evaluation of that data.

[000209] As a practical example of how to use a deployed trained Al model, a trained Al model can first be trained to recognize hand-written digits from the Mixed National Institute of Standards and Technology ("MNIST") dataset. An image can be created containing a handwritten digit, perhaps directly through a touch-based interface or indirectly by scanning a piece of paper with the handwritten digit written on it. The image can then be down sampled to a resolution of 28x28 and converted to grayscale, as this is the input schema used to train the example trained Al model. When submitted to the trained Al model-server through the trained Al model server API, the trained Al model can take the image as input and output a one-dimensional array of length 10 (whereby each array item represents the probability, as judged by the trained Al model, that the image is a digit corresponding to the index). The array could be the value returned to the user from the API, which the user could use as needed.

[000210] Though a linear approach to building a trained Al model is presented in an embodiment, an author-train-deploy workflow does not have to be treated as a waterfall process. If the user decides further refinement of a trained Al model is needed, be it through additional training with existing data, additional training with new, supplemental data, or additional training with a modified version of the mental model or curricula used for training, the Al engine is configured to support versioning of Al models so that the user can preserve (and possibly revert to) the current state of an Al model while refining the trained state of the Al model until a new, more satisfactory state is reached.

Command line interface ("CLI")

[000211 ] The CLI is a tool configured to enable users to configure the Al engine. The CLI is especially useful for automation and connection to other tools. Some sub-tasks can only be performed using the CLI. Some sub-tasks that can be performed using the CLI include loading a pedagogical programming language file and connecting a simulator.

Web site

[000212] The web site is configured as a browser-based tool for configuring and analyzing Al models stored in the Al engine. The website can be used for sharing, collaborating, and learning. Some information that can be accessed from the web site is a visualization of an Al model's training progress.

Computing infrastructure

[000213] Figure 1 A provides a block diagram illustrating an Al system 700A and its cloud-based computing platforms infrastructure in accordance with an embodiment. A backend cloud platform can exist of various servers, processes, databases, and other components that connect over a network, such as the Internet, to a plurality of computing devices. The backend cloud platform is configured to handle the scaling, efficiency, etc. Such a cloud platform can be a public cloud, Virtual Public Cloud, or a private cloud. Note, a similar computing platform may also implemented on an on-premises computing platform such as Figure 1 B.

[000214] In an embodiment, a user, such as a software developer, can interface with the Al system 700A through an online interface 701 . However, the user is not limited to the online interface, and the online interface is not limited to that shown in Figure 1 A. An input may be supplied from an online API, such as www.bons.ai, a command line interface, and a graphical user interface such as an Integrated Development Environment ("IDE") such as Mastermind™. With this in mind, the Al system 700A of Figure 1 A can enable a user to make API and web requests through a domain name system ("DNS") 701 , which requests can be optionally filtered through a proxy to route the API requests to an API load balancer 705 and the web requests to a web load balancer 707. Alternatively, the proxy service may be part of a service running on a CPU computing device. The API load balancer 705 can be configured to distribute the API requests among multiple processes wrapped in their own containers running in a containerization platform, such as a Docker-type network. The web load balancer 707 can be configured to distribute the web requests among the multiple processes wrapped in their own containers running in this containerization platform. The network can include a cluster of one or more central processing unit ("CPU") computing devices 709 and a cluster of one or more graphics processing unit ("GPU") computing devices 71 1 . One or more services running in the network will scale to more or less CPU computing devices 709 and GPU computing devices 71 1 as needed. The CPU computing devices 709 can be utilized for most independent processes running on the swarm network. The GPU computing devices 71 1 can be utilized for the more computationally intensive independent processes such as TensorFlow and the learner process. Various services may run on either the CPU computing device 709 or in the GPU computing device 71 1 , as capacity in that machine is available at the time.

[000215] As further shown in Figure 1 A, a logging Stack 713 can be shared among all production clusters for dedicated monitoring and an indexing/logging.

[000216] The cloud-based platform with multiple independent processes is configured for the user to define the Al problem to be solved. In an embodiment, all of the individual processes are wrapped into a container program such as a Docker. The software container allows each instance of that independent process to run independently on whatever computing device that instance is running on. [000217] The individual processes in the Al engine utilize a scaling hardware platform, such as Amazon Web Services ("AWS"), so that the individual processes of the Al engine, the amount of Central Processing Units ("CPUs"), Graphics Processing Units ("GPUs"), and RAM may dynamically change overtime and rapidly change to scale to handle multiple users sending multiple Al models to be trained or multiple simulations from a single user to train one or more Al models.

[000218] For example, an engineer service can be configured to dynamically change an amount of computing devices 709, 71 1 overtime running independent processes and to rapidly change the amount to scale to handle multiple users sending multiple Al models to be trained. A conductor service or an engineer service can cause a first instance of an instructor process to be instantiated, loaded onto a CPU computing device, and then run on a first CPU computing device 709.

[000219] The Al engine may have multiple independent processes on the cloud- based platform. The multiple independent processes may be configured as an independent process wrapped in its own container so that multiple instances of the same processes, e.g. learner process and instructor process, can run simultaneously to scale to handle one or more users to perform sub-tasks. The sub-tasks can include 1 ) running multiple training sessions on two or more Al models at the same time, in parallel, 2) creating two or more Al models at the same time, 3) running a training session on one or more Al models while creating one or more Al models at the same time, 4) deploying and using two or more trained Al models to do predictions on data from one or more data sources, and 5) any combination of these four, on the same Al engine. CPU bound services can include, for example, a document database for storing Al objects such as an Al database; a Relational Database Server such as PostgreSQL; a time-series database 217 such as InfluxDB database optimized to capture training data going into and out of a metagraph (e.g., metagraph 400A or Figure 4A) for at least a 100-episode set of training episodes for training an Al model; an Al-model service including an architect module and Al compiler; an Al-model web service; a conductor service; a watchman service; a CPU Engineer service; an instructor process; a predictor service; and other similar processes. GPU Bound services can include, for example, a GPU Engineer service, a learner process, and other computationally heavy services. For example, a first CPU computing device may load and run an architect module. A second CPU computing device may load and run, for example, an instructor process. A first GPU computing device may load and run, for example, a learner process. A first service such as an engineer service, may then change an amount of computing devices running independent processes by dynamically calling in a third CPU computing device to load and run, for example, a second instance of the instructor process, and calling in a second GPU computing device to load and run, for example, a second instance of the learner process.

[000220] Scaling in this system may dynamically change both 1 ) an amount of independent processes running and 2) an amount of computing devices configured to run those independent processes, where the independent processes are configured to cooperate with each other. The dynamically changing of an amount of computing devices, for example, more GPUs or CPUs in order to run additional instance of the independent processes allows multiple users to utilize the cloud-based system at the same time and to, for example, 1 ) conduct multiple training sessions for Al models in parallel, 2) deploy Al models for use, and 3) create new Al models, all at the same time. Clusters of hardware of CPU devices and GPU devices can be dynamically scaled in and out on, for example, an hourly basis based on percent load capacity used and an amount of RAM memory left compared to a current or expected need.

[000221] Figure 1 B provides a block diagram illustrating an Al system 700B and its on-premises based computing platforms infrastructure in accordance with an embodiment. Following on the Al system 700A, a bastion host server and one or more CPU computing devices, such as a first CPU computing device 709A and a second computing device 709B, can be on a public subnet for bidirectional communication through an Internet gateway. One or more GPU computing devices, such as a first GPU computing device 71 1 A, can be on a private subnet communicatively coupled with the public subnet by means of a subnet there between. The one or more CPU computing devices on the public subnet can be utilized on a first CPU computing device 709A by the compiler and the architect module/process that are part of an Al-model service. One or more other CPU computing devices on a second CPU computing device 709B on the private subnet can be utilized by the instructor module. The GPU computing devices can be utilized by the learner module/process and the predictor module/process. As further shown in Figure 1 B, the private subnet can be configured to send outgoing communications to the Internet through a network address translation ("NAT") gateway.

[000222] The modules of the Al engine installed on on-premises servers can generate Al models. The modules of the Al engine installed on on-premises servers can deploy Al models for prediction. The modules of the Al engine installed on on-premises servers can train Al models.

Network

[000223] Figure 10 illustrates a number of electronic systems and devices communicating with each other in a network environment in accordance with an embodiment. The network environment 800 has a communications network 820. The network 820 can include one or more networks selected from an optical network, a cellular network, the Internet, a Local Area Network ("LAN"), a Wide Area Network ("WAN"), a satellite network, a fiber network, a cable network, and combinations thereof. In an embodiment, the communications network 820 is the Internet. As shown, there may be many server computing systems and many client computing systems connected to each other via the communications network 820. However, it should be appreciated that, for example, a single client computing system can also be connected to a single server computing system. As such, Figure 10 illustrates any combination of server computing systems and client computing systems connected to each other via the communications network 820.

[000224] The communications network 820 can connect one or more server computing systems selected from at least a first server computing system 804A and a second server computing system 804B to each other and to at least one or more client computing systems as well. The server computing system 804A can be, for example, the one or more server systems of, for example, Figures 1 A and 1 B. The server computing systems 804A and 804B can each optionally include organized data structures such as databases 806A and 806B. Each of the one or more server computing systems can have one or more virtual server computing systems, and multiple virtual server computing systems can be implemented by design. Each of the one or more server computing systems can have one or more firewalls to protect data integrity.

[000225] The at least one or more client computing systems can be selected from a first mobile computing device 802A (e.g., smartphone with an Android-based operating system), a second mobile computing device 802E (e.g., smartphone with an iOS-based operating system), a first wearable electronic device 802C (e.g., a smartwatch), a first portable computer 802B (e.g., laptop computer), a third mobile computing device or second portable computer 802F (e.g., tablet with an Android- or iOS-based operating system), a smart device or system incorporated into a first smart automobile 802D, a smart device or system incorporated into a first smart bicycle 802G, a first smart television 802H, a first virtual reality or augmented reality headset 804C, and the like.

[000226] The client computing systems (e.g., 802A - 802H, and/or 804C) can include, for example, the software application or the hardware-based system in which the trained Al model can be deployed. Additionally, the server 804B may have a simulator configured to train an Al model with the Al engine of cloud 804A. Each of the one or more client computing systems and/or cloud platforms can have one or more firewalls to protect data integrity.

[000227] It should be appreciated that the use of the terms "client computing system" and "server computing system" is intended to indicate the system that generally initiates a communication and the system that generally responds to the communication. For example, a client computing system can generally initiate a communication and a server computing system generally responds to the communication. No hierarchy is implied unless explicitly stated. Both functions can be in a single communicating system or device, in which case, the client-server and server-client relationship can be viewed as peer-to-peer.

[000228] Any one or more of the server computing systems can be a cloud provider. A cloud provider can install and operate application software in a cloud (e.g., the network 820 such as the Internet), and cloud users can access the application software from one or more of the client computing systems. Generally, cloud users that have a cloud-based site in the cloud cannot solely manage a cloud infrastructure or platform where the application software runs. Thus, the server computing systems and organized data structures thereof can be shared resources, where each cloud user is given a certain amount of dedicated use of the shared resources. Each cloud user's cloud-based site can be given a virtual amount of dedicated space and bandwidth in the cloud. Cloud applications can be different from other applications in their scalability, which can be achieved by cloning tasks onto multiple virtual machines at run-time to meet changing work demand. Load balancers distribute the work over the set of virtual machines. This process is transparent to the cloud user, who sees only a single access point.

[000229] Cloud-based remote access can be coded to utilize a protocol, such as Hypertext Transfer Protocol ("HTTP"), to engage in a request and response cycle with an application on a client computing system such as a web-browser application resident on the client computing system. The cloud-based remote access can be accessed by a smartphone, a desktop computer, a tablet, or any other client computing systems, anytime and/or anywhere. The cloud-based remote access is coded to engage in 1 ) the request and response cycle from all web browser based applications, 2) the request and response cycle from a dedicated on-line server, 3) the request and response cycle directly between a native application resident on a client device and the cloud-based remote access to another client computing system, and 4) combinations of these.

[000230] In an embodiment, the server computing system 804A can include a server engine, a web page management component, a content management component, and a database management component. The server engine can perform basic processing and operating-system level tasks. The web page management component can handle creation and display or routing of web pages or screens associated with receiving and providing digital content and digital advertisements. Users (e.g., cloud users), can access one or more of the server computing systems by means of a Uniform Resource Locator ("URL") associated therewith. The content management component can handle most of the functions in the embodiments described herein. The database management component can include storage and retrieval tasks with respect to the database, queries to the database, and storage of data.

[000231] In an embodiment, a server computing system can be configured to display information in a window, a web page, or the like. An application including any program modules, applications, services, processes, and other similar software executable when executed on, for example, the server computing system 804A, can cause the server computing system 804A to display windows and user interface screens in a portion of a display screen space. With respect to a web page, for example, a user via a browser on the client computing system 802B can interact with the web page, and then supply input to the query/fields and/or service presented by the user interface screens. The web page can be served by a web server, for example, the server computing system 804A, on any Hypertext Markup Language ("HTML") or Wireless Access Protocol ("WAP") enabled client computing system (e.g., the client computing system 802B), or any equivalent thereof. The client computing system 802B can host a browser and/or a specific application to interact with the server computing system 804A. Each application has a code scripted to perform the functions that the software component is coded to carry out such as presenting fields to take details of desired information. Algorithms, routines, and engines within, for example, the server computing system 804A can take the information from the presenting fields and put that information into an appropriate storage medium such as a database (e.g., database 806A). A comparison wizard can be scripted to refer to a database and make use of such data. The applications may be hosted on, for example, the server computing system 804A and served to the specific application or browser of, for example, the client computing system 802B. The applications then serve windows or pages that allow entry of details.

Computing systems

[000232] Figure 1 1 illustrates a computing system 900 that can be, wholly or partially, part of one or more of the server or client computing devices in accordance with an embodiment. With reference to Figure 1 1 , components of the computing system 900 can include, but are not limited to, a processing unit 920 having one or more processing cores, a system memory 930, and a system bus 921 that couples various system components including the system memory 930 to the processing unit 920. The system bus 921 may be any of several types of bus structures selected from a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures.

[000233] Computing system 900 typically includes a variety of computing machine- readable media. Computing machine-readable media can be any available media that can be accessed by computing system 900 and includes both volatile and nonvolatile media, and removable and non-removable media. The system memory 930 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 931 and random access memory (RAM) 932. By way of example, and not limitation, computing machine-readable media use includes storage of information, such as computer-readable instructions, data structures, other executable software or other data. Computer-storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other tangible medium which can be used to store the desired information and which can be accessed by the computing device 900. Transitory media such as wireless channels are not included in the machine- readable media. Communication media typically embody computer readable instructions, data structures, other executable software, or other transport mechanism and includes any information delivery media. As an example, some client computing systems on the network 920 of Figure 10 might not have optical or magnetic storage.

[000234] A basic input/output system 933 (BIOS) containing the basic routines that help to transfer information between elements within the computing system 900, such as during start-up, is typically stored in ROM 931 . RAM 932 typically contains data and/or software that are immediately accessible to and/or presently being operated on by the processing unit 920. The RAM 932 can include a portion of the operating system 934, application programs 935, other executable software 936, and program data 937. [000235] The drives and their associated computer storage media discussed above and illustrated in Figure 10, provide storage of computer readable instructions, data structures, other executable software and other data for the computing system 900. In Figure 10, for example, the solid state memory 941 is illustrated for storing operating system 944, application programs 945, other executable software 946, and program data 947. Note that these components can either be the same as or different from operating system 934, application programs 935, other executable software 936, and program data 937. Operating system 944, application programs 945, other executable software 946, and program data 947 are given different numbers here to illustrate that, at a minimum, they are different copies.

[000236] A user may enter commands and information into the computing system 900 through input devices such as a keyboard, touchscreen, or software or hardware input buttons 962, a microphone 963, a pointing device and/or scrolling input component, such as a mouse, trackball or touch pad. The microphone 963 can cooperate with speech recognition software. These and other input devices are often connected to the processing unit 920 through a user input interface 960 that is coupled to the system bus 921 , but can be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB). A display monitor 991 or other type of display screen device is also connected to the system bus 921 via an interface, such as a display interface 990. In addition to the monitor 991 , computing devices may also include other peripheral output devices such as speakers 997, a vibrator 999, and other output devices, which may be connected through an output peripheral interface 995.

[000237] The computing system 900 can operate in a networked environment using logical connections to one or more remote computers/client devices, such as a remote computing system 980. The remote computing system 980 can a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computing system 900. The logical connections depicted in Figure 9 can include a personal area network ("PAN") 972 (e.g., Bluetooth®), a local area network ("LAN") 971 (e.g., Wi-Fi), and a wide area network ("WAN") 973 (e.g., cellular network), but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet. A browser application may be resident on the computing device and stored in the memory.

[000238] When used in a LAN networking environment, the computing system 900 is connected to the LAN 971 through a network interface or adapter 970, which can be, for example, a Bluetooth® or Wi-Fi adapter. When used in a WAN networking environment (e.g., Internet), the computing system 900 typically includes some means for establishing communications over the WAN 973.

[000239] It should be noted that the present design can be carried out on a computing system such as that described with respect to Figure 9. However, the present design can be carried out on a server, a computing device devoted to message handling, or on a distributed system in which different portions of the present design are carried out on different parts of the distributed computing system.

[000240] In an embodiment, software used to facilitate algorithms discussed herein can be embodied onto a non-transitory machine-readable medium. A machine-readable medium includes any mechanism that stores information in a form readable by a machine (e.g., a computer). For example, a non-transitory machine-readable medium can include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; Digital Versatile Disc (DVD's), EPROMs, EEPROMs, FLASH memory, magnetic or optical cards, or any type of media suitable for storing electronic instructions.

[000241] Note, an application described herein includes but is not limited to software applications, mobile apps, and programs that are part of an operating system application. Some portions of this description are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like. These algorithms can be written in a number of different software programming languages such as Python, Java, HTTP, C, C+, or other similar languages. Also, an algorithm can be implemented with lines of code in software, configured logic gates in software, or a combination of both. In an embodiment, the logic consists of electronic circuits that follow the rules of Boolean Logic, software that contain patterns of instructions, or any combination of both.

[000242] Many functions performed by electronic hardware components can be duplicated by software emulation. Thus, a software program written to accomplish those same functions can emulate the functionality of the hardware components in input-output circuitry.

[000243] Figures 12A through 12C provide a flow diagram illustrating a method for a hierarchical decomposition deep reinforcement learning for an Artificial Intelligence model in accordance with an embodiment. As shown, the method includes a number of steps. Note, the following steps may be performed in any order where logically possible, and not all of them need to be performed.

[000244] In step 2, the Al engine applies a hierarchical-decomposition reinforcement learning technique to train one or more Al objects as concept nodes composed in a hierarchical graph incorporated into the Al model.

[000245] In step 4, the Al engine uses the hierarchical-decomposition reinforcement learning technique to hierarchically decompose a complex task into multiple smaller, individual sub-tasks making up the complex task. One or more of the individual sub- tasks, each corresponds to its own concept node in the hierarchical graph. The Al engine initially trains the Al objects on the individual sub-tasks and then trains on how the individual sub-tasks need to interact with each other in the complex task in order to deliver an end solution to the complex task. [000246] In step 6, the Al engine decomposes the complex task, enabling use of reward functions focused for solving each individual sub-task and then one or more reward functions focused for the end solution of the complex task, as well as enabling conducting the training of the Al objects corresponding to the individual sub-tasks in the complex task, in parallel. The combined parallel training and simpler reward functions speed up an overall training duration for the complex task on one or more computing platforms, and subsequent deployment of a resulting Al model that is trained, compared to an end-to-end training with a single algorithm for all of the Al objects incorporated into the Al model.

[000247] In step 8, the Al engine uses information supplied from a user interface to apply the hierarchical-decomposition reinforcement learning technique to train one or more Al objects.

[000248] In step 10, the Al engine automatically partitions the individual sub-tasks into the concept nodes in the Al model to be trained on in a number of ways, where the ways of conveying the partitioning of the individual sub-tasks into the concept nodes are selected from a group consisting of i) how to partition the individual sub-tasks is explicitly defined in scripted code from the user, ii) how to partition the individual sub-tasks is hinted at by giving general guidance in the scripted code from the user, iii) how to partition the individual sub-tasks is interpreted from guidance based on responses from the user to a presented list of questions, and iv) any combination of these three, and then the architect module proposes a hierarchical structure for the graph of Al objects making up the Al model. The hinted at guidance in the scripted code from the user on how to automatically partition and ii) where the interpreted guidance from answered questions on how to automatically partition may be based on state space. The architect module based on the guidance, identifies regions of state space that correspond to separable, individually solvable subtasks and creates distinct policies for each so identified region of state space.

[000249] In step 12, the Al engine instantiates the Al objects corresponding to the concepts of the complex task into the graph of i) a first concept node corresponding to an integrator node and ii) one or more levels of concepts corresponding to the individual sub- tasks that hierarchically stem forth from the integrator node in the graph of the Al model. The Al engine initially trains each integrator and its set of child nodes feeding that integrator node in the graph of nodes. Note, each child node in the set feeding that integrator node is either 1 ) an individual Al object or 2) another integrator node that is trained to best satisfy its own reward function for its described individual sub task; and next. The training of integrator nodes and their child nodes use reward functions to best satisfy its described individual sub task continues up the graph of nodes until that process reaches a root integrator node, where a reward function for the root integrator node is focused for best satisfying the end solution to the complex task.

[000250] In step 14, the Al engine cooperates with one or more data sources to obtain data for training and to conduct the training of the two or more Al objects corresponding to concept nodes in the hierarchical graph, in parallel at the same time.

[000251] In step 16, the Al engine causes the Al engine to i) initially train each individual Al object to solve its individual sub-task and its corresponding one or more reward functions focused for solving that sub-task; and then, ii) next using an integrator node to train the set of individual sub-tasks in the complex task to combine the concepts of the individual sub-tasks to cooperate and work with each other to achieve the complex task. The integrator node then has a reward function focused for the end solution to the complex task.

[000252] In step 18, an integrator is configured to select a first set of concept nodes from two or more sets in the graph of the complex task to be trained on and computed.

[000253] In step 20, i) where the hinted at guidance in the scripted code from the user on how to automatically partition and ii) where the interpreted guidance from answered questions on how to automatically partition is based on state space, where an architect module based on the guidance identifies regions of state space that correspond to separable, individually solvable subtasks and creates distinct policies for each so identified region of state space. [000254] In step 22, each integrator node in the graph of nodes is configured to evaluate in turn, based on data from the simulator, which of the nodes in the graph are currently appropriate to solving its sub task, and merely those nodes so chosen will be further provided with data from the simulator and evaluated, which saves an amount of computing power and cycles compared to computing all of the nodes making up the Al model each evaluation cycle.

[000255] In step 24, the Al engine trains the Al objects of the Al model include a blend of at least a first set and second set of Al objects being trained by the first module via reinforcement learning and a third set of Al objects that are configured to operate in two ways: 1 ) as control nodes where one or more actions are produced by the code this node and/or 2) this node just implements a data transformation step, where the first module of the Al engine is configured to manage multiple simulations from the data sources in parallel at the same time to train the first and second sets of Al objects with the reinforcement learning. Also, the first module of the Al engine is configured to manage multiple simulations from the data sources in parallel at the same time to train the first and second sets of Al objects with the reinforcement learning.

[000256] In step 26, the Al engine decomposing the complex task allows each concept making up the complex task in the graph to use a most appropriate training approach for that individual sub-task, whether that be a classical motion controller, a preexisting learned model, or a neural network that needs to be trained rather than the whole Al model being trained with one of these training approaches.

[000257] In step 28, the Al engine uses a conductor service to handle scaling of running multiple training sessions for the two or more Al objects in parallel at the same time by dynamically calling in additional computing devices to load on and run additional instances of processes for the first module for each training session occurring in parallel.

[000258] In step 30, the Al engine decomposing the complex task allows replacing one or more concepts making up the complex task without retraining each concept in the graph making up that Al model. [000259] In step 32, a combination of parallel training of concepts and the use of reward functions focused for solving each individual sub-task speeds up an overall training duration for the complex task on one or more computing platforms, and subsequent deployment of a resulting Al model that is trained, compared to an end-to- end training with a single algorithm for all of the Al objects incorporated into the Al model.

[000260] While the foregoing design and embodiments thereof have been provided in considerable detail, it is not the intention of the applicant(s) for the design and embodiments provided herein to be limiting. Additional adaptations and/or modifications are possible, and, in broader aspects, these adaptations and/or modifications are also encompassed. Accordingly, departures may be made from the foregoing design and embodiments without departing from the scope afforded by the following claims, which scope is only limited by the claims when appropriately construed.