Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TRAINING ARCHITECTURE USING GAME CONSOLES
Document Type and Number:
WIPO Patent Application WO/2023/154128
Kind Code:
A1
Abstract:
An artificial intelligent agent can act as a player in a video game, such as a racing video game. The agent can race against, and often beat, the best players in the world. The game can be completely external to the agent and can run in real time. In this way, the training system is much more like a real world system. The consoles on which the game runs for training the agent are provided in a cloud computing environment. The agents and the trainers can run on other computing devices in the cloud, where the system can choose the trainers and agent compute based on proximity to console, for example. Users can choose the game they want to run and submit code which can be built and deployed to the cloud system. Metrics and logs and artifacts from the game can be sent to cloud storage.

Inventors:
WURMAN PETER (US)
BARRETT LEON (US)
KHANDELWAL PIYUSH (US)
WHITEHEAD DION (US)
DOUGLAS RORY (US)
AGHABOZORGI HOUMEHR (US)
BELTRAN JUSTIN V (US)
AHAD RABIH ABDUL (US)
AZZAM BANDALY (US)
Application Number:
PCT/US2022/073700
Publication Date:
August 17, 2023
Filing Date:
July 13, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SONY GROUP CORP (JP)
SONY CORP AMERICA (US)
SONY INTERACTIVE ENTERTAINMENT LLC (US)
International Classes:
A63F13/67; A63F13/355; A63F13/48; A63F13/77
Domestic Patent References:
WO2019113502A22019-06-13
Foreign References:
US20200179808A12020-06-11
US20190354759A12019-11-21
Attorney, Agent or Firm:
LIN, Vic (US)
Download PDF:
Claims:
CLAIMS

1. A training system computing architecture comprising: data gatherers configured to interact with a game on a cloud-based game console; a trainer configured to review experiences from the data gatherers and improve policies for the data gatherers for interacting with the game; an experiment manager component configured to monitor a state of an experiment and to determine whether to run the experiment once the experiment is in a scheduling state, the experiment manager component starting the experiment on a predetermined number of the cloud-based game consoles, with a predetermined number of data gatherers; and a monitoring service permitting a user to monitor the experiment.

2. The training system of claim 1, further comprising: a development source code control service for managing code for the data gatherers, the trainers and the experiment definition program and creating a docker image thereof; and a production source code control service mirroring the development source code control service and building a docker image for an experiment.

3. The training system computing architecture of claim 1, wherein the experiment definition program defines how many of the data gatherers to use, how much computing power is needed for the data gatherers and the trainers, what algorithms the trainer should use, and a set of tasks for the trainers to put the data gatherers through in the game.

4. The training system computing architecture of claim 1, wherein the cloud-based consoles are shared with human users playing the game.

5. The training system computing architecture of claim 1, wherein the data gatherers and the trainers are deployed at one or more environments.

6. The training system computing architecture of claim 1, wherein the experiment manager component reviews a priority level of the experiment, an age of the experiment, and/or whether resources the experiment requires are available in any acceptable environment to determine whether to run the experiment.

7. The training system computing architecture of claim 1, wherein the trainers are programmed with one or more tasks for a respective one of the data gatherers.

8. The training system computing architecture of claim 7, wherein the experiment is completed when each of the one or more tasks are completed for each of the data gatherers.

9. The training system computing architecture of claim 1, further comprising a runs and metrics database for storing information about the game played for the experiment, the information including at least one of artifacts, neural networks, replay buffers and algorithm state.

10. The training system computing architecture of claim 1, further comprising one or more artificial intelligence learning algorithms, used by the trainers, receiving experiences from the data gatherers to update a game playing policy of the data gatherers.

11. A method for training an artificial intelligent agent to play a video game on a cloud-based game console, comprising: programming the artificial intelligent agent to interact in the video game; configuring trainers to review experiences from the artificial intelligent agents and improve policies for the artificial intelligent agents for interacting with the video game; monitoring a state of the experiment with an experiment manager component and determining whether to run the experiment once the experiment is in a scheduling state; starting the experiment on a predetermined number of the cloud-based game consoles, with a predetermined number of the data gatherers; receiving experiences from the data gatherers with respect to playing the video game; and executing one or more learning algorithms to update a game playing policy of the data gatherers.

12. The method of claim 11, further comprising: managing code for the artificial intelligent agents, the trainers and an experiment definition program with a development source code control service and creating a docker image thereof; and mirroring the development source code control service with a production source code control service within a game console system build environment and building a docker image for an experiment.

13. The method of claim 11, further comprising defining, by the experiment definition program, how many of the data gatherers to use, how much computing power is needed for the data gatherers and the trainers, what algorithms the trainer should use, and a set of tasks and type of curriculum for the trainers to put the data gatherers through in the video game.

14. The method of claim 11, wherein the cloud-based game consoles are shared with human users playing the game.

15. The method of claim 11, further comprising deploying the data gatherers and the trainers at one or more environments.

16. The method of claim 11, further comprising reviewing, by the experiment manager component, a priority level of the experiment, an age of the experiment, an experiment quota, and/or whether resources the experiment requires are available in any acceptable environment to determine whether to run the experiment.

17. The method of claim 11, further comprising programming the trainers with one or more tasks for a respective one of the data gatherers.

18. The method of claim 11, further comprising storing information about the game played for the experiment in a runs and metrics database, the information including at least one of artifacts, neural networks, replay buffers and algorithm state.

19. An artificial intelligent agent configured to compete in a video game, the artificial intelligent agent trained on a cloud-based game console shared with human players, the artificial intelligent agent trained by a method comprising: programming the artificial intelligent agent to interact in the video game; configuring trainers to review experiences from the artificial intelligent agents and improve policies for the artificial intelligent agents for interacting with the video game; monitoring a state of the experiment with an experiment manager component and determining whether to run the experiment once the experiment is in a scheduling state; starting the experiment on a predetermined number of the cloud-based game consoles, with a predetermined number of the data gatherers; receiving experiences from the data gatherers with respect to playing the video game; and executing one or more learning algorithms to update a game playing policy of the data gatherers.

20. The artificial intelligent agent of claim 19, trained by the method further comprising reviewing, by the experiment manager component, a priority level of the experiment, an age of the experiment, and/or whether resources the experiment requires are available in any acceptable environment to determine whether to run the experiment

Description:
TRAINING ARCHITECTURE USING GAME CONSOLES

BACKGROUND OF THE INVENTION

1. Field of the Invention

[0001] Embodiments of the invention relates generally artificial intelligence training. More particularly, the invention relates to systems for training an artificial agent using game consoles.

2. Description of Prior Art and Related Information

[0002] The following background information may present examples of specific aspects of the prior art (e.g., without limitation, approaches, facts, or common wisdom) that, while expected to be helpful to further educate the reader as to additional aspects of the prior art, is not to be construed as limiting the present invention, or any embodiments thereof, to anything stated or implied therein or inferred thereupon.

[0003] Video game players often desire to improve their game through practice and playing against other players. However, once a game player develops exceptional skills in a given game, the availability of suitable challengers greatly decline. While such players may be able to improve their game by playing against less skilled players, it is usually more helpful to play against a player that can provide a significant challenge. [0004] Many games provide game-provided players that can participate. However, these players may simply be following certain programming that a skillful player can figure out and defeat.

[0005] In view of the foregoing, there is a need for a system and method for training an artificial intelligent agent to have the ability to challenge even the best skilled video game players. SUMMARY OF THE INVENTION

[0006] Embodiments of the present invention provide a training system computing architecture comprising a build environment permitting a user to build data gatherers, trainers and an experiment definition program, the data gatherers being configured to interact with a game on a cloud-based game console, the trainer configured to review experiences from the data gatherers and improve policies for the data gatherers for interacting with the game; a development source code control service for managing code for the data gatherers, the trainers and the experiment definition program and creating a docker image thereof; a production source code control service managing the development source code control service and building a docker image for an experiment; an experiment manager component configured to monitor a state of the experiment and determining whether to run the experiment once the experiment is in a scheduling state, the experiment manager component starting the experiment on a predetermined number of the cloud-based game consoles, with a predetermined number of data gatherers; and a monitoring service permitting a user to monitor the experiment. [0007] Embodiments of the present invention further provide a method for training an artificial intelligent agent to play a video game on a cloud-based game console comprising programming the artificial intelligent agent to interact in the video game; configuring trainers to review experiences from the artificial intelligent agents and improve policies for the artificial intelligent agents for interacting with the video game; storing and sharing code for the artificial intelligent agents, the trainers and an experiment definition program with a development source code control service and creating a docker image thereof; managing the development source code control service with a production source code control service within a game console system build environment. In some embodiments, the development source code control service and the production source code control service may be one and the same. The method can further include building a docker image for an experiment; monitoring a state of the experiment with an experiment manager component and determining whether to run the experiment once the experiment is in a scheduling state; starting the experiment on a predetermined number of the cloud-based game consoles, with a predetermined number of the data gatherers; receiving experiences from the data gatherers with respect to playing the video game; and executing one or more artificial intelligence learning algorithms to update a game playing policy of the data gatherers. [0008] Embodiments of the present invention also provide an artificial intelligent agent configured to compete in a video game, the artificial intelligent agent trained on a cloud-based game console, the artificial intelligent agent trained by a method comprising programming the artificial intelligent agent to interact in the video game; configuring trainers to review experiences from the artificial intelligent agents and improve policies for the artificial intelligent agents for interacting with the video game; reviewing code for the artificial intelligent agents, the trainers and an experiment definition program with a development source code control service and creating a docker image thereof; mirroring the development source code control service with a production source code control service within a game console system build environment and building a docker image for an experiment; monitoring a state of the experiment with an experiment manager component and determining whether to run the experiment once the experiment is in a scheduling state; starting the experiment on a predetermined number of the cloud-based game consoles, with a predetermined number of the data gatherers; receiving experiences from the data gatherers with respect to playing the video game; and executing one or more artificial intelligence algorithms to update a game playing policy of the data gatherers.

[0009] These and other features, aspects and advantages of the present invention will become better understood with reference to the following drawings, description and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[00010] Some embodiments of the present invention are illustrated as an example and are not limited by the figures of the accompanying drawings, in which like references may indicate similar elements.

[0011] FIG. 1 illustrates an exemplary system architecture for training agents using game consoles according to an embodiment of the present invention;

[0012] FIG. 2 illustrates resources used in the system architecture of FIG. 1;

[0013] FIG. 3 illustrates a schematic representation of a user computing device used in the architecture and methods according to exemplary embodiments of the present invention; and

[0014] FIG. 4 illustrates services provided by a landlord service for controlling resource use in the architecture and methods according to exemplary embodiments of the present invention.

[0015] Unless otherwise indicated illustrations in the figures are not necessarily drawn to scale.

[0016] The invention and its various embodiments can now be better understood by turning to the following detailed description wherein illustrated embodiments are described. It is to be expressly understood that the illustrated embodiments are set forth as examples and not by way of limitations on the invention as ultimately defined in the claims.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS AND BEST MODE OF INVENTION

[0017] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well as the singular forms, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.

[0018] Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one having ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

[0019] In describing the invention, it will be understood that a number of techniques and steps are disclosed. Each of these has individual benefit and each can also be used in conjunction with one or more, or in some cases all, of the other disclosed techniques. Accordingly, for the sake of clarity, this description will refrain from repeating every possible combination of the individual steps in an unnecessary fashion. Nevertheless, the specification and claims should be read with the understanding that such combinations are entirely within the scope of the invention and the claims.

[0020] In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be evident, however, to one skilled in the art that the present invention may be practiced without these specific details.

[0021]The present disclosure is to be considered as an exemplification of the invention and is not intended to limit the invention to the specific embodiments illustrated by the figures or description below. [0022] Devices or system modules that are in at least general communication with each other need not be in continuous communication with each other, unless expressly specified otherwise. In addition, devices or system modules that are in at least general communication with each other may communicate directly or indirectly through one or more intermediaries.

[0023] A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary, a variety of optional components are described to illustrate the wide variety of possible embodiments of the present invention.

[0024] A "computer" or “computing device” may refer to one or more apparatus and/or one or more systems that are capable of accepting a structured input, processing the structured input according to prescribed rules, and producing results of the processing as output. Examples of a computer or computing device may include: a computer; a stationary and/or portable computer; a computer having a single processor, multiple processors, or multi-core processors, which may operate in parallel and/or not in parallel; a supercomputer; a mainframe; a super mini-computer; a mini-computer; a workstation; a micro-computer; a server; a client; an interactive television; a web appliance; a telecommunications device with internet access; a hybrid combination of a computer and an interactive television; a portable computer; a tablet personal computer (PC); a personal digital assistant (PDA); a portable telephone; application-specific hardware to emulate a computer and/or software, such as, for example, a digital signal processor (DSP), a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), an application specific instruction-set processor (ASIP), a chip, chips, a system on a chip, or a chip set; a data acquisition device; an optical computer; a quantum computer; a biological computer; and generally, an apparatus that may accept data, process data according to one or more stored software programs, generate results, and typically include input, output, storage, arithmetic, logic, and control units.

[0025] Software" or “application” may refer to prescribed rules to operate a computer. Examples of software or applications may include code segments in one or more computer-readable languages; graphical and or/textual instructions; applets; precompiled code; interpreted code; compiled code; and computer programs. [0026] These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

[0027] Further, although process steps, method steps, algorithms or the like may be described in a sequential order, such processes, methods and algorithms may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.

[0028] It will be readily apparent that the various methods and algorithms described herein may be implemented by, e.g., appropriately programmed general purpose computers and computing devices. Typically, a processor (e.g., a microprocessor) will receive instructions from a memory or like device, and execute those instructions, thereby performing a process defined by those instructions. Further, programs that implement such methods and algorithms may be stored and transmitted using a variety of known media.

[0029] The term "computer-readable medium" as used herein refers to any medium that participates in providing data (e.g., instructions) which may be read by a computer, a processor or a like device. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks and other persistent memory. Volatile media include dynamic random access memory (DRAM), which typically constitutes the main memory. Transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise a system bus coupled to the processor. Transmission media may include or convey acoustic waves, light waves and electromagnetic emissions, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASHEEPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

[0030] Various forms of computer readable media may be involved in carrying sequences of instructions to a processor. For example, sequences of instruction (i) may be delivered from RAM to a processor, (ii) may be carried over a wireless transmission medium, and/or (iii) may be formatted according to numerous formats, standards or protocols, such as Bluetooth, TDMA, CDMA, 3G, 4G, 5G and the like.

[0031] Embodiments of the present invention may include apparatuses for performing the operations disclosed herein. An apparatus may be specially constructed for the desired purposes, or it may comprise a general-purpose device selectively activated or reconfigured by a program stored in the device.

[0032] Unless specifically stated otherwise, and as may be apparent from the following description and claims, it should be appreciated that throughout the specification descriptions utilizing terms such as "processing," "computing," "calculating," "determining," or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.

[0033] In a similar manner, the term "processor" may refer to any device or portion of a device that processes electronic data from registers and/or memory to transform that electronic data into other electronic data that may be stored in registers and/or memory or may be communicated to an external device so as to cause physical changes or actuation of the external device.

[0034] The term "agent" or "intelligent agent" or "artificial agent" or "artificial intelligent agent" is meant to refer to any man-made entity that chooses actions in response to observations. "Agent" may refer without limitation to a robot, to a simulated robot, to a software agent or "bot", an adaptive agent, an internet or web bot. [0035] Broadly, embodiments of the present invention provide an artificial intelligent agent can act as a player in a video game, such as a racing video game. The game can be completely external to the agent and can run in real time. In this way, the training system is much more like a real world system. The consoles on which the game runs for training the agent are provided in a cloud computing environment. The agents and the trainers can run on other computing devices in the cloud, where the system can choose the trainers and agent compute based on proximity to console, for example. Users can choose the game they want to run and submit code which can be built and deployed to the cloud system. A resource management service can monitor game console resources between human users and research usage and identify experiments for suspension to ensure enough game consoles for human users.

[0036] Referring to FIGS. 1 through 4, the basic workflow can be envisioned as follows.

On a user’s local machine

[0037] On the user’s local machine 10, the user can write a computer program (usually in Python, for example) for the agent. This program is called a “data gatherer” 12 and such an agent can be programmed to know how to interact with and control a game. The user can further write a computer program (usually in Python, for example) for the “trainer” 14. The trainer 14 can be programmed to know how to take experiences from data gatherers 12 and use them to improve policies for the agent (data gatherer 12). The trainer 14 may use any number of algorithms and neural network structures as may be present in an artificial intelligence (Al) library 16. The user can write a third program which defines the experiment 18. This program is typically in the form of a configuration file, written in, for example, a human-readable data-serialization language, such as YAML, that can define how many data gatherers 12 to use, how much computing power is needed for the data gatherers 12 and trainers 14, what algorithms the trainer 14 should use, the set of tasks for the trainer 14 to put the data gatherers 12 through, and the like.

[0038] The user can check in their code (data gatherer 12, trainer 14 and experiment definition 18) to a source code repository, such as GitHub 22. The user can run a command line program, via a command line interface 23, that submits a request to the build system 26 in build environment 20 to build the experiment if no existing docker image can be reused. The user then tells server 52, in the monitoring environment 57, via data query interface 25, asking it to run the experiment identified by its source code check-in reference hash. The system server 52 can store information about the requested experiment in a database 56 with the state <submitted>. In some embodiments, there may also be a web interface 24 that lets a user request a run. As shown in FIG. 1, the web interface 24 and command line interface 23 can interact with a data query and manipulation interface 25, such as Hasura/GraphQL, to permit the user to review experiments during or after their execution, as discussed below. Of course, other query interfaces may be utilized for the review of data by the user.

In the cloud

[0039] In the cloud computing environment, a build system 26 can build the user’s code into a docker image 28. The build system 26 can be any virtual machine imaging system, such as Circled, for example. If the experiment requires resources from the cloud game system 30 (also referred to as production build environment 30), the production build environment 30 can pull code from the development build environment 20, where their build system can run a variety of secondary security evaluations with a source code repository 32, such as GitLab, and then also build the user’s code with a docker build 34 into a docker image 36. The system can set the experiment state to <building> and record which environment (such as data center DC-1 (environment 38) and data center DC-2 (environment 40), as shown in FIG. 2) are building it. While the description herein describes using a docker image and FIG. 1 illustrates Kubernetes 44 as a container orchestration system for interfacing with the docker image 36, it should be understood that other types of architecture may be used to obtain the same purpose. For example, the docker runtime may be replaced by a runtime that is compliant with the container runtime interface of Kubernetes. Similarly, container orchestrations system Kubernetes can be replaced with other orchestration systems like Slurm.

[0040] Periodically, the resource control service 42 in each environment 38, 40 can look at the build system 30 in its view and look for building experiments. When one completes, the resource control service 42 can transition the experiment state in its environment to a <built> state. The system can watch for the transitions to <built> in each environment and once all required environments are done, can change the overall experiment state to <built>. The system can watch for experiments in the <built> stage and transfers them to a <queued> state.

[0041] Periodically, the system can evaluate the experiments in the <queued> state and can make decisions about whether an experiment should be started. When deciding whether an experiment should start, the system can consider the priority level of the experiment, the age of the experiment, whether the resources the experiment requires are available in any acceptable environment, and other such criteria for scheduling the experiment, such as quota limits by user or project, and the like.

[0042] If the system decides to start an experiment, it can mark the experiment as <scheduling> and can tag the experiment with identifiers for the resources it should consume. For example, the system may decide that a particular experiment should be run with game consoles 46 (such as PS4’s, for example) and with data gatherers 12 in a particular environment 38, 40. The experiment can be run using a GPU (such as V100 GPUs 48) for the trainers 14 in the same or different environment and will add annotations to the experiment to record those decisions.

[0043] Periodically, the resource control service 42 in each target environment can look at whether there are experiments in the <scheduling> state that are tagged to start in its environment 38, 40. If so, it can start the required resources.

When a data gatherer starts

[0044] Technically, a data gatherer can be any program. In the context of embodiments of the present invention, the data gatherer 12 can be one that is playing a game (such as a PlayStation® game) within the network of the cloud game system production environment 50.

[0045] The data gatherer 12 can find the trainer 14 it is working with as specified by the system server 52 and connect to it. The data gatherer 12 can request a game system user ID from a service that manages user IDs for training agents. The data gatherer 12 requests an available console 46 in the cloud gaming system 50 and also requests a particular game be loaded.

[0046] The data gatherer 12 can then request a task from the trainer 14. Tasks are essentially configurations of the game that it should play. For example, in a racing game, one task might have the data gatherer 12 start clusters of five cars spaced evenly around the track in which each cluster contains one car controlled by the agent and three cars controlled by the game’s built-in Al.

[0047] The data gatherer 12 can start the game, communicate the scenario configuration to the game, and then start playing. As the agent plays the game, it can send its experiences to the trainer 14. [0048] Periodically, the data gatherer 12 can fetch updated models from the trainer 14. Optionally, the data gatherer 12 may also send metrics to the database 56 via data query interface 25 during or after the scenario. For example, the data gatherer 12 may report its best lap time. Optionally, the data gatherer 12 may store other data, such as complete race data, in a remote data store 58, such as S3. Optionally, the data gatherer 12 may configure the video output of the cloud game console to stream to S3 so it can be viewed later by the experimenter.

[0049] When the task termination criteria are met, the data gatherer 12 can terminate the scenario on the cloud game console 46 and can request a new task from the trainer 14.

When a trainer starts

[0050] The trainer 14 can initialize a buffer where it can store experiences reported by the data gatherers 12. Optionally, a buffer from a previous run can be loaded. The trainer 14 can maintain a list of tasks from which it hands out new tasks to data gatherers 12 when they request one.

[0051] Periodically, the trainer 14 loads experiences from the buffer and uses learning algorithms to update the neural network models. Optionally, the trainer 14 will report metrics to the system, where such metrics are stored in the metrics database 56. Updated neural network models can be sent to the data gatherer 12.

On the user’ s machine

[0052] While an experiment is building and running, the user can monitor it using, for example, a web browser 24 connected to the system server 52 via data query interface 25. The system interface can show the progress through the experiment building and deployment stages. Once the experiment is running, the system interface can allow the user to inspect metrics and create dashboards displaying various graphs of performance. The system interface can also be used to graph metrics across multiple runs at the same time to allow users to compare the performance of different experiments. Suspend and resume

[0053] The resource management service 60, also referred to simply as resource manager 60, is the name of the service that the cloud game system has to coordinate resources with external services. Because the training is performed on actual game consoles, the training system shares the game system (such as the PlayStation® network) with humans. When more humans want to play games, resource management service 60 tells the training system 50 to scale back usage. When humans stop playing, resource management service 60 gives the training system 50 more resources. System server 52 makes use of resource control service 42, also referred to as experiment manager 42, to make adjustments in resource use based on targets set by the resource management service 60.

[0054] As discussed in greater detail below, some key features of the integration of the training system with the cloud game system are as follows: (1) a module 62 that measures load due to human activity; (2) a module 64 that predicts future load; and (3) a module 66 that determines how many of those resources can be given to researchers. The resource control service 42 can provide the following features, including (4) a module 68 that reads the number of resources available; (5) a module 70 that starts and stops experiments according to the resource constraints and the priorities/age/quotas of the job; and (6) a module 72 that restarts jobs in environments where resources are available. Modules 70 and 72 may be part of the system server 52. In some embodiments, an experiment can be run in multiple environments, while the resource control service 42 (the experiment manager 42) only controls resources in one environment. In some embodiments, for example, if the system server 52 does not act, or does not act quickly enough, the resource management service 60 may end experiments according to a pre-programmed protocol, such as first-in, first-out, for example.

[0055] The training system can monitor the cloud game system’s resource management service 60. When the system notices that the resources (especially cloud game consoles) allocated to the training system have decreased below the system current usage, the system can identify one or more experiments to suspend. When deciding which experiments to suspend, the system may consider location of the resources in use by the experiment, priority level of the experiment, age of the experiment, user ID of the experiment and/or other attributes on the experiment. The system can move the selected experiment into a <suspending> state.

[0056] Each resource control service 42 in each environment (such as locations 38, 40) can periodically check the system server to see if an experiment they are running has moved into a <suspending> state. If so, the resource control service can terminate the processes under their control. When a trainer is asked to suspend, it can save state information (particularly its experience buffer) to remote storage so that it can be reloaded later before gracefully shutting down.

[0057] Once all of the processes under their control are terminated, the resource control service 42 will change their portion of the experiment to a <suspended> state. When all of the relevant resource control services 42 have transitioned their portions to <suspended>, the system can transition the whole experiment to a <suspended> state. [0058] When the system sees in the resource control service 42 that the number of available resources is greater than the number of resources in use, the system can look at the list of runs that are suspended. The system may restart some of these experiments. The choice about which experiments to restart may consider location of the resources in use by the experiment, priority level of the experiment, age of the experiment, user ID of the experiment and/or other attributes on the experiment.

[0059] To avoid thrashing, the system server may smooth the signals about resource availability that it receives from the resource management service 60. It may smooth these signals by applying any number of standard algorithms, like low-pass filters, min- max time windows, or the like. Optionally, the user can click a button to suspend an experiment that is running. This experiment will move to the <manually suspended> state. The user may choose to reactivate a manually suspended experiment by pressing a button in the user interface. The system will move the experiment to <suspended>. The system will then reactivate the experiment when resources are available, subject to the same conditions as above.

Completion

[0060] A user may write termination conditions into their trainer script so that when certain conditions are met, it will report that it completed to the system and then terminate. The system will change the experiment state to <success>. Alternatively, the user may use the system’s interface to click the “Cancel” button. The system will shut down the experiment immediately following a process similar to the suspend process discussed above, but without saving the current experiment state. The system will set the experiment state to <canceled>.

[0061] Many alterations and modifications may be made by those having ordinary skill in the art without departing from the spirit and scope of the invention. Therefore, it must be understood that the illustrated embodiments have been set forth only for the purposes of examples and that they should not be taken as limiting the invention as defined by the following claims. For example, notwithstanding the fact that the elements of a claim are set forth below in a certain combination, it must be expressly understood that the invention includes other combinations of fewer, more or different ones of the disclosed elements.

[0062] The words used in this specification to describe the invention and its various embodiments are to be understood not only in the sense of their commonly defined meanings, but to include by special definition in this specification the generic structure, material or acts of which they represent a single species.

[0063] The definitions of the words or elements of the following claims are, therefore, defined in this specification to not only include the combination of elements which are literally set forth. In this sense it is therefore contemplated that an equivalent substitution of two or more elements may be made for any one of the elements in the claims below or that a single element may be substituted for two or more elements in a claim. Although elements may be described above as acting in certain combinations and even initially claimed as such, it is to be expressly understood that one or more elements from a claimed combination can in some cases be excised from the combination and that the claimed combination may be directed to a subcombination or variation of a subcombination.

[0064] Insubstantial changes from the claimed subject matter as viewed by a person with ordinary skill in the art, now known or later devised, are expressly contemplated as being equivalently within the scope of the claims. Therefore, obvious substitutions now or later known to one with ordinary skill in the art are defined to be within the scope of the defined elements.

[0065] The claims are thus to be understood to include what is specifically illustrated and described above, what is conceptually equivalent, what can be obviously substituted and also what incorporates the essential idea of the invention.