Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
STOCHASTICITY MITIGATION IN DEPLOYED AI AGENTS
Document Type and Number:
WIPO Patent Application WO/2024/063897
Kind Code:
A1
Abstract:
The techniques disclosed herein mitigate stochasticity when controlling a mechanical system with artificial intelligence (AI) agents. In some configurations, AI agents are created using data generated by a machine learning model. Stochasticity is segmented temporally into near term and long term, and different strategies are used to address stochasticity in the different timeframes. For example, long term stochasticity may be addressed with changes to the reward function used to train the model. Short term stochasticity may be addressed by applying a margin to the output of an AI agent. Example margins include window averaging, clamps, and statistical process control bounds. In one configuration, AI agents are regression brains that are generated from setpoints inferred by the model from environmental states. The limitations inherent to fitting a regression line to this data may result in some predicted setpoints being outside of an allowed range.

Inventors:
MAITRA KINGSUK (US)
BRYANT BRENDAN LEE (US)
PREMOE CHRIS ALLEN (US)
ANDERSON KENCE (US)
Application Number:
PCT/US2023/030882
Publication Date:
March 28, 2024
Filing Date:
August 23, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MICROSOFT TECHNOLOGY LICENSING LLC (US)
International Classes:
G06N3/006; G06N3/092
Other References:
LI RUNZE ET AL: "Learning the Aerodynamic Design of Supercritical Airfoils Through Deep Reinforcement Learning", AIAA JOURNAL, vol. 59, no. 10, 12 March 2021 (2021-03-12), US, pages 3988 - 4001, XP093099936, ISSN: 0001-1452, DOI: 10.2514/1.J060189
WANG RUIHANG ET AL: "Toward Physics-Guided Safe Deep Reinforcement Learning for Green Data Center Cooling Control", 2022 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS), IEEE, 4 May 2022 (2022-05-04), pages 159 - 169, XP034138892, DOI: 10.1109/ICCPS54341.2022.00021
Attorney, Agent or Firm:
CHATTERJEE, Aaron C. et al. (US)
Download PDF:
Claims:
CLAIMS 1. A method comprising: training a machine learning model based on states collected from a mechanical system; using the machine learning model to infer a plurality of actions from a plurality of sets of states; generating an AI agent based on the plurality of sets of states and the plurality of actions; deploying the AI agent with a stochasticity mitigation engine configured to modify an output of the AI agent; wherein the AI agent computes an individual action from an individual state, and wherein the stochasticity mitigation engine generates a modified individual action derived from the individual action that controls the mechanical system. 2. The method of claim 1, wherein the stochasticity mitigation engine applies a clamp to a value included in the individual action. 3. The method of claim 1, wherein the individual action controls a setpoint of the mechanical system. 4. The method of claim 1, wherein the machine learning model is trained with a delta action that is generated by combining a fixed state included in the states with a fluctuating state included in the states. 5. The method of claim 4, wherein the plurality of actions is inferred in part by applying a method of moments computation to convert the delta action into an absolute action that can be used directly to control a setpoint of the mechanical system. 6. The method of claim 1, wherein the AI agent comprises a regression brain that maps the individual state to the individual action, wherein the regression brain is generated by: selecting one or more types of states; and computing a regression line that maps states of the one or more types of states to the individual action. 7. A computing device comprising: one or more processors; a memory in communication with the one or more processors, the memory having computer-readable instructions stored thereupon which, when executed by the one or more processors, cause the computing device to: receive an AI agent that maps a state of a mechanical system to an action; receive a stochasticity mitigation engine; provide the action to the stochasticity mitigation engine, wherein the stochasticity mitigation engine generates a modified action by modifying the action according to a defined set of parameters of the mechanical system; and apply the modified action to control an aspect of the mechanical system. 8. The computing device of claim 7, wherein the stochasticity mitigation engine modifies the action by applying a window averaging technique that smooths out a stream of values. 9. The computing device of claim 7, wherein the AI agent is generated based on a plurality of states and a corresponding plurality of actions, wherein the plurality of actions are inferred from the plurality of states by a machine learning model, , wherein the AI agent comprises a regression brain that maps substates of an individual state to an individual action, and wherein the regression brain is generated by fitting a regression line to the substates. 10. The computing device of claim 7, wherein the action includes a value that exceeds the defined set of parameters of the mechanical system because the regression brain extrapolated beyond a threshold value. 11. The computing device of claim 7, wherein the mechanical system comprises a cooling system, wherein the state of the mechanical system comprises a return water temperature, and wherein the action sets a chiller water temperature setpoint. 12. A computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to: train a machine learning model based on states collected from a mechanical system; use the machine learning model to infer a plurality of actions from a plurality of sets of states; generate an AI agent based on the plurality of sets of states and the plurality of chiller water setpoint actions; deploy the AI agent with a stochasticity mitigation engine configured to modify an output of the AI agent; wherein the AI agent computes an individual action from a subset of substates of an individual state, and wherein the stochasticity mitigation engine generates a modified individual action derived from the individual action that controls the mechanical system. 13. The computer-readable storage medium of claim 12, wherein the mechanical system comprises a cooler system, wherein the subset of substates comprises an outside air temperature, a wet bulb temperature, and a load, and wherein the individual action sets a supervisory setpoint that comprises a condenser water setpoint, a condenser water pump flow rate, a chilled water setpoint, or a chilled water pump flow rate. 14. The computer-readable storage medium of claim 12, wherein the individual action sets a supervisory setpoint that comprises a condenser water setpoint, a condenser water pump flow rate, a chilled water setpoint, or a chilled water pump flow rate. 15. The computer-readable storage medium of claim 12, wherein the individual state originates from sensor data, and wherein the individual action controls the mechanical system by setting a supervisory setpoint.
Description:
STOCHASTICITY MITIGATION IN DEPLOYED AI AGENTS BACKGROUND Control systems are used for controlling supervisory setpoints of mechanical systems, such as flow rates of a manufacturing process or cooling levels of a heating, ventilation, and air- conditioning (HVAC) system. Supervisory setpoints – also referred to herein as “setpoints” - are often calibrated and set by technicians. However, setpoints set by technicians are often left unchanged for long periods of time, and as such may not result in efficient operation of the mechanical system. For example, variable demands on the system being controlled or unexpected changes in the operating environment may result in an inefficient outcome. Improper setpoints may also result in excessive wear, reduced operating effectiveness, or even damage to the system. Machine-learning based techniques have been used for controlling setpoints of mechanical systems. However, it has proved challenging to obtain training data representative of the allowable range of setpoint values. For example, most HVAC chillers operate with a fixed chilled water setpoint (CHW SWS), e.g. 42 degrees Fahrenheit, when the range of allowable values is 40-55 degrees Fahrenheit. Operators are reluctant to experiment with different setpoint values without knowing that the system will continue to function effectively. As a result, there is a paucity of data collected in which a full range of setpoint values are represented. This limited range of real-world input values results in poor exploration of possible outputs when training a machine learning model. As a result, it is difficult if not impossible to train a machine learning model with fixed setpoint inputs to generate setpoint outputs that yield optimal results. It is with respect to these and other considerations that the disclosure made herein is presented. SUMMARY The techniques disclosed herein mitigate stochasticity when controlling a mechanical system with artificial intelligence (AI) agents. In some configurations, AI agents are created using data generated by a machine learning model. Stochasticity is segmented temporally into near term and long term, and different strategies are used to address stochasticity in the different timeframes. For example, long term stochasticity may be addressed with changes to the reward function used to train the model. Short term stochasticity may be addressed by applying a margin to the output of an AI agent. Example margins include window averaging, clamps, and statistical process control bounds. In one configuration, AI agents are regression brains that are generated from setpoints inferred by the model from environmental states. The limitations inherent to fitting a regression line to this data may result in some predicted setpoints being outside of an allowed range. Features and technical benefits other than those explicitly described above will be apparent from a reading of the following Detailed Description and a review of the associated drawings. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The term “techniques,” for instance, may refer to system(s), method(s), computer-readable instructions, module(s), algorithms, hardware logic, and/or operation(s) as permitted by the context described above and throughout the document. BRIEF DESCRIPTION OF THE DRAWINGS The Detailed Description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same reference numbers in different figures indicate similar or identical items. References made to individual items of a plurality of items can use a reference number with a letter of a sequence of letters to refer to each individual item. Generic references to the items may use the specific reference number without the sequence of letters. FIG. 1 illustrates a chiller-based cooling system. FIG.2 illustrates computing delta actions from state data collected during operation of the cooling system. FIG. 3 illustrates training a machine learning model using state data and delta actions. FIG. 4 illustrates using the trained machine learning model to generate delta action predictions. FIG. 5 illustrates using a method of moments technique to convert delta actions into absolute actions. FIG. 6A illustrates generating an AI agent, such as a regression brain, based on sets of cooling system states and corresponding absolute chiller water setpoint actions. FIG. 6B illustrates using the regression brain to constrain an AI agent. FIG. 7 illustrates changes in the power usage by the cooling system in accordance with one example. FIG. 8 illustrates temporal segmentation of margins. FIG. 9 is a flow diagram of an example method for using a machine learning model to mitigate stochasticity in a deployed AI agent. FIG. 10 is a computer architecture diagram illustrating an illustrative computer hardware and software architecture for a computing system capable of implementing aspects of the techniques and technologies presented herein. FIG. 11 is a diagram illustrating a distributed computing environment capable of implementing aspects of the techniques and technologies presented herein. DETAILED DESCRIPTION When applied in a real life system, an AI agent may generate noisy, random, and non-ideal actions. In order to correct the output of an AI agent, margins – also referred to as stochasticity mitigation engines - are constructed based on a worst-case scenario of the mechanical system as it is controlled by the deployed AI agent. Then, the deployed AI agent is augmented with margins that account for, control, or counter-act the predicted possibilities. One form of stochasticity – noise - is inherent to complex, interconnected mechanical systems. When an AI agent is integrated with software that controls a mechanical system – e.g. a building automation system in the case of an HVAC system or a manufacturing execution system in the case of a factory – and the AI agent controls the equipment, the noise in communication between the piece of software and the equipment it drives is a function of the proximity of the software to the equipment. There is noise in the commands coming from the software to the equipment, and there is noise in the feedback sent from the equipment to the software. This noise exists because communication between software and components of a mechanical system happen through a series of electronics – digital to analog converters (DACs) and analog to digital converters (ADCs), and these electronics, despite best efforts, cannot be rid of stochasticity. Another example of stochasticity is randomness introduced by human interaction. For example, changes in temperature preferences throughout the day may have an effect on the load requested of an HVAC system. When a mechanical system is operated by AI agents, the values provided by an AI agent may also be a source of stochasticity. In some configurations, these instances of stochasticity are mitigated by putting in a margin. A margin may slow a change in a setpoint, establish a maximum value of a setpoint, or otherwise modify an action to accommodate a requirement of the mechanical system. For example, if the AI agent instructs the building automation system to increase a setpoint temperature to 45 degrees Fahrenheit, a margin may intercept this command and limit the increase to 44.5 degrees. The margin may prevent moving the setpoint to 45 degrees because it is known that there will be some scenarios in real life operations where 45 degrees will become sub-optimal. A recommendation by the AI agent to increase the setpoint to 45 degrees could be sub-optimal due to inherent randomness, even if the model’s predictions were made based on accurate and complete data that reflects all of the possibilities that the AI agent is expected to encounter. For example, the model may be trained on complete and accurate data, but the limitations inherent to how an AI agent is encoded may cause some actions to be sub-optimal. In the example above of limiting the increase in a setpoint to 44.5 degrees, instead of allowing the setpoint to go all the way up to 45 degrees as indicated by the AI agent, some efficiency appears to be lost. Instead of allowing the chiller water temperature to rise to 45 degrees, energy must be expended to keep the temperature to 44.5 degrees. This restriction may be applied if the AI agent was allowing the temperature to rise too quickly – a quick rise to 45 degrees could have been damaging to the equipment. Allowing the chiller temperature setpoint to rise to 45 over the course of a week may recapture some of the lost efficiency without the downsides of a quick temperature adjustment. In this example, the quick rise in setpoint is an example of stochasticity caused by using AI agents to control the mechanical system. Margin in the form of a stochasticity mitigation engine is applied to prevent damage to the mechanical system. The stochasticity mitigation acts as a check on the AI agent, overriding the agent even if the model that generated the data used to create the agent was trained on all of the possibilities that the agent could encounter. When applying AI to real-life technologies, there is a tendency from a software perspective to deploy the solution and call it done. But this misses the effects of wear and tear, maintenance, and other sources of drift that occur over the lifetime of the system. By analogy to purchasing an automobile, the purchaser does not expect a new vehicle to only make it off the lot before experiencing a serious problem – long margins are built into the car to so that it lasts at least the warranty period before encountering significant problems. Margins may be segmented temporally. This reflects a reality in which long term drift in performance of the mechanical system is different than short term spikes and other fluctuations. For example, on an automobile, an engine or a transmission may age over a number of years. This tendency is referred to as “drift” - a change that will slowly begin to appear over time that affects how the equipment operates. Since drift is often the cumulation of many individual events, some of which may cancel each other out, the effects of drift tend to be less random than individual stochastic events. As such, long margins may account for drift by altering the reinforcement learning function used to train the model. Specifically, the reinforcement learning function may be adjusted to anticipate and account for the small changes predicted to happen over time. In contrast to drift, a driver pressing the brake pedal, the gas pedal, and moving the steering wheel causes a more instantaneous form of stochasticity. Taking into account how the car ages over time does not account for changes in how a driver uses the car throughout the day. For example, a driver may press the brake differently at 9am than at 9:15am. This variation in how the brake pedal is pressed is an example of non-ideality/stochasticity. It is challenging to integrate and then mitigate stochastic drift that appears over the course of years with stochasticity that appears over the course of minutes. Accordingly, temporal segmentation of margins is critical to correct for the different types of stochasticity. Short margins may be implemented by adjusting the output of an AI agent, e.g. by averaging outputs. Other types of short margins include statistical process control bounds. Short margins may also be implemented by applying an upper or lower bound on a value, or applying a clamp to a value, or adding a safety margin around an AI agent that is drifting. The point of interception is between the AI agent and the equipment that the agent controls. Placing short margins – e.g. clamps and mitigation strategies – at the interception point is one way of handling the messiness of the real world. Once an AI agent is created, even if long margins are accounted for while training the model, stochasticity still emanates at the point of contact of the regression brain and the equipment that it controls. Without accounting for this stochasticity, the mechanical system may experience earlier than expected end of life, increased wear and tear, increased operating costs, reduced reliability, etc. Without short margins, a mechanical system driven by AI agents may only operate for seconds before causing the mechanical system to fail. This has been shown empirically for multiple AI agents. Typically, the AI agent may operate the mechanical system for a few seconds, but then it tells a machine to exceed a limit, the AI agent doesn’t realize that a limit is being exceeded, so the machine fails to perform the requested operation and shuts down. Without long margins, mechanical systems controlled by AI agents would experience drift over a period of months or years. Typically the effect would be a loss of efficiency, but an early end of life would not be unexpected. The techniques disclosed herein are often used in the context of techniques that enable utilizing a full range of setpoint values to control a mechanical system. A machine learning model is trained with states collected from the mechanical system. Some of the states may have little to no variation, limiting exploration of possible setpoint values when training the model. To enable a more thorough exploration of possible setpoint values, the states are augmented with a fluctuating delta value that is derived from a fixed setpoint value. For example, a delta outside air temperature may be computed by subtracting outside air temperature, which fluctuates, from a fixed chilled water setpoint. A method of moments computation converts delta values inferred by the model back into absolute values. The absolute values are used to compute a regression equation that is usable by the mechanical system to compute a setpoint action for a given set of input states. The model may be trained with real-world state and action data collected during the operation of the mechanical system. In order to improve exploration of possible model outputs, states collected from the mechanical system that are fixed or limited in value may be augmented or replaced with related states that fluctuate in value. For example, a delta state that is related to the fixed state may be computed by subtracting a fluctuating state from the fixed state. In some configurations, a reinforcement learning technique is used to train the model with the additional/replacement states. The model may be trained, for example to generate setpoint actions that are usable to control temperature regulators, pumps, valves, and other components of the mechanical system. There may be some initial relationship between states and actions encoded by the model, but before training these relationships are likely to be sub-optimal. During training the relationships between states and actions are improved as guided by the reward function. For example, the relationship between an outside air temperature state and a chilled water setpoint action may be improved as guided by the reward function. Since the model has been trained on delta values, the setpoint actions it infers are delta actions. Delta actions do not represent real-world values, and as such are not directly applicable to control a real-world system. Method of moments is one technique for converting a delta action inferred by the model into an absolute action that can be used to control the mechanical system. In this way, method of moments is a powerful empirical technique that enables more exhaustive exploration when training on data with fixed, single value setpoints. This more exhaustive exploration avoids the need to conduct expensive experiments that vary the setpoint value. These experiments may be disruptive to carry out on a live production line and/or may result in unexpected or undesirable behavior in the mechanical system. These experiments may also take more time than is practical to carry out, particularly when the disclosed techniques enable training on existing data. In some configurations, instead of being used to directly infer setpoint actions from a real-time state of the mechanical system, the trained model is used to generate a regression brain. As referred to herein, a regression brain refers to a procedure or equation that maps states of a mechanical system to a setpoint action. The regression brain is typically much simpler than a machine learning model, enabling it to be deployed for use on-site. For example, a regression brain may be a linear equation that maps one or more states to a setpoint action. Typically, the number of states used as input to a regression brain is less than the number of states used to train the model. Also, regression brain input states are typically selected so as not to be overdetermined. For example, an input state that represents a feedback, such as a return water temperature, is often excluded from a regression brain because it encodes information that is already encoded in other states used by the regression brain. To create an AI agent such as a regression brain, the machine learning model is used to generate a corpus of data points by inferring setpoint actions from sets of states of the mechanical system. A regression analysis is performed on the sets of states and corresponding setpoint actions to generate coefficients of the regression brain. For example, a least squares calculation may be applied to the sets of states and corresponding setpoint actions to identify a regression line. The regression brain uses the coefficients of the regression line when generating a setpoint action. Specifically, to apply the regression brain, states observed in the mechanical system in real-time may be multiplied by the appropriate regression line coefficients to produce a setpoint action. Examples described in this disclosure relate to autonomous control of supervisory setpoints using artificial intelligence. Certain examples relate to autonomously controlling supervisory setpoints using deep reinforcement learning and machine teaching as applied to, but not limited to, “HVAC- like” systems for smart building operations. The described examples are directed to supervisory control and thus are compute-lite. In addition, unlike some traditional artificial intelligence (AI) systems that take direct and intrusive control, the examples described herein are not disruptive to the existing operations and product lines of entities that may deploy the supervisory control systems and methods described herein. Often, operators of mechanical systems leave supervisory setpoints unchanged. As a result, states and actions collected from the mechanical system may represent a narrow range of possible values. For example, a flow rate in a manufacturing process may be set to a single fixed value. This limitation to input data prevents a thorough exploration of possible actions when training the model, and as a result, the model may miss optimal solutions. This limitation of the training data is made more acute when there are no simulations, even data-driven simulations, which could be used to simulate flow ranges in the manufacturing process. In order to allow the model to explore a wider range of possible values, values that were fixed or otherwise constrained during data collection may be replaced by delta values that fluctuate throughout training. As referred to herein, a delta value – also referred to as a delta action - refers to a synthetic value that is generated from a setpoint and an additional state. For example, a delta flow rate may be computed by subtracting a pressure measurement – the additional state - from a flow rate setpoint. In some configurations, the additional state is selected because it provides feedback to the setpoint. Continuing the example, the pressure measurement may be selected as the additional state because it provides feedback about the flow rate. In this example, the flow rate is an absolute value – a value that is obtained directly from a sensor, and which can be used to control a component of the mechanical system. In some configurations, two delta values are provided as additional inputs to the model. These delta actions may replace the setpoint they are based on. In other configurations, these delta actions are provided to the model in addition to the setpoint they are based on. In general, for each setpoint being controlled, two additional delta actions are provided as inputs to the model. E.g., if there are 12 flow rate setpoints, at least 24 delta actions may be used. Adapting almost any number of setpoints in this way enables these techniques to be generally applied, e.g., they are not restricted to controlling systems within a building, nor to energy management. For example, the cooling of a datacenter with multiple buildings by multiple chiller-based HVAC systems, each with a temperature setpoint, may be optimized by adding two-times as many delta values as the number of temperature setpoints. Other applications, such as manufacturing, may also benefit from training the model to optimized multiple supervisory setpoints. When the machine learning model is trained with delta values it has no notion of the absolute values the delta values were derived from, and so the model is trained to generate delta actions. Before they can be used to control the mechanical system or train a regression brain, delta actions must be converted back to absolute actions that can be applied to real-world operations. This process may be referred to as “undeltaing the delta”. In some configurations, a method of moments technique leverages the underlying physics and thermodynamics of the mechanical system to “undelta” the deltas. Once this has been accomplished, the undelta-ed setpoints and the sets of states used to infer them are used to generate regression brains. In some configurations, some of the states that are available from the mechanical system, and which were used to train the model, are left out when creating the regression brain. This is done for at least three reasons: to reduce complexity, because it is not necessary to include all of the states to achieve the goal embodied by the reinforcement learning function, and in order to simplify use of regression brains in the field. States may be selected to reduce complexity by selecting states that represent environmental conditions while omitting states that represent feedback within the mechanical system. For example, return water temperature may be omitted from the regression brain because it encodes feedback. In order to simplify the use of regression brains in the field, states that are intuitive and readily available to an on-site operator of the mechanical system are selected. In the example of chiller- based HVAC systems, these states may include outside air temperature, humidity (as measured by a wet bulb temperature), and the building demand (in tonnage). Limiting the states to a small number of easily accessible, intuitive values improves the experience of a technician using the regression brain. After “undeltaing” the delta actions and selecting a subset of states, the regression brain may be constructed. For example, to create a regression brain, a simple regression may be applied to construct a linear function which returns a setpoint action for a subset of environmental states. A regression brain may be created for each setpoint. Regression brains may be linear equations, quadratic equations, or more complicated equations. One benefit of providing a regression brain with a simple mathematical equation is enabling an operator of the mechanical system to compute actions quickly, easily, and repeatedly without having to understand and use a complicated equation. Providing the regression brain as a simple equation has the added benefit of allowing it to be utilized without having to leverage or buy an expensive piece of software – a simple spreadsheet may suffice. FIG. 1 illustrates a chiller-based cooling system 100, e.g. an HVAC system, that cools building 102. Cooling system 100 is comprised of cooling tower 104, chiller 106, and condenser water pump (CWP) 110 that circulates water between them. Cooling system 100 also includes chilled water pump (CHP) 120 that circulates water from chiller 106 to building 102. Air handler 130 of building 102 may use chilled water provided by chiller 106 To regulate the temperature of building 102. Load 108 illustrates how much heat is taken out of building 102 by air handler 130 and returned to chiller 106. In some configurations, load 108 may be measured by comparing the temperature of water leaving chiller 106 and chiller return water temperature 124. Load 108 is often measured in tonnage or British Thermal Units (BTUs). Chilled water setpoint 122 is a supervisory setpoint of cooling system 100 that allows a technician or autonomous system to control the temperature of the water provided by chiller 106 to building 102. A typical chiller-based cooling system may have a range of allowable chilled water setpoints of 40-55 degrees Fahrenheit. In practice, when manually setting the chilled water setpoint 122, technicians tend to err on the side of overcooling, in order to ensure that building temperature remains comfortable. Analogously, condenser water setpoint 112 is a supervisory setpoint that controls the temperature of water provided by cooling tower 104 to chiller 106. Condenser return water temperature 114 is a measure of the temperature of the water returned from chiller 106 to cooling tower 104. FIG. 1 also illustrates Outside air temperature 132 and wet bulb temperature 134. Outside air temperature 132 is a measure of the ambient air temperature around cooling system 100. Wet bulb temperature 134 is a measure of humidity of the environment surrounding cooling system 100. Chiller return water temperature 124, condenser return water temperature 114, outside air temperature 132, and wet bulb temperature 134 are examples of states of cooling system 100. States are typically measured with instruments, meters, or other physical devices, reflecting real- world conditions of cooling system 100. FIG.2 illustrates computing delta actions from states 210 collected during operation of the cooling system 100. States 210 includes condenser water temperature setpoint 112, condenser return water temperature 114, chiller water temperature setpoint 122, chiller return water temperature 124, outside air temperature 132, wet bulb temperature 134, and load 108, for example. States 210 may include setpoints – settings that an aspect or component of the cooling system 100 is given or goals that the cooling system 100 is given. An example of a goal setpoint is chilled water setpoint 122. States 210 may also include measured setpoint values – e.g., the actual measured temperature of water leaving chiller 106 that chiller water setpoint 122 controls. The list of states 210 is non-limiting - other states measured from cooling system 100 are similarly contemplated, such as differential pressure measurement within pipes, measures of electricity consumed by cooling tower 104, chiller 106, pumps 110 and 120, and/or air handler 130, and the like. The amount and timeframe of the collected data may vary based on the type of mechanical system being controlled. For example, when the mechanical system is affected by changes in environmental factors or occupancy levels, the model may benefit from training on data collected over one or more seasons. For instance, a model used to control a chiller-based HVAC system may benefit from a years-worth of data that reflects seasonal weather patterns, occupancy signatures, and other effects on a demand profile. Training data collected from real-world operation of mechanical systems is limited to the actions that were used while the data was collected. In particular, it is very common for supervisory setpoints, such as temperature setpoints, to be single setpoint operations – i.e., the setpoint is fixed to a single value for all of the training data. As discussed above, a state that is fixed or that exhibits a limited range may constrain the possible setpoint actions that the model may discover. In order to address this issue, a state or action that is fixed throughout the training data may be replaced with a related “delta” value that fluctuates. These fluctuations increase the range of possible states and setpoint actions that are explored by the model. As a result, more optimal mappings between states and setpoint actions can be learned. For example, if a setpoint is fixed throughout training data, it may be replaced with a “delta” value that is computed by combining the setpoint with a value that fluctuates across the training data. In the context of a chiller-based HVAC system, a temperature setpoint controlled by a technician may be fixed at a particular value, e.g. 42 degrees Fahrenheit. To compensate for this, a “delta” value may be computed based on the temperature setpoint and some other state, e.g. by subtracting an outside air temperature from the temperature setpoint. Outside air temperature fluctuates, and so “temperature setpoint – outside air temperature” will also fluctuate throughout the training data. 7KLV^³GHOWD´^YDOXH^PD\^EH^UHIHUUHG^ WR^KHUHLQ^DV^³ǻ2$7´^^$GDSWLQJ^H[LVWLQJ^VWDWHV^ LQ^ WKLV^ZD\^ enables training the model without a simulator, simplifying the process and expanding the reach of the disclosed embodiments. In some configurations, a fixed or limited range state is compensated for with two delta values. In WKH^H[DPSOH^RI^DQ^+9$&^V\VWHP^WKDW^XWLOL]HV^D^FKLOOHU^^D ^GHOWD^UHWXUQ^ZDWHU^WHPSHUDWXUH^^ǻ5:7^^ value is defined as: ^^^^ǻ5:7^ ^FKLOOHU^ZDWHU^VHWSRLQW^±^UHWXUQ^ZDWHU^WHPSHUDWXUH^^ Return water temperature is a type of feedback that indicates the heating load from the building, i.e., how much heat has to be removed. This informs how much chilled water needs to be provided as efficiently as possible. 'HOWD^RXWVLGH^DLU^WHPSHUDWXUH^^ǻ2$7^^LV^GHILQHG^DV^^ ^^^^ǻ2$7^ ^FKLOOHU^ZDWHU^VHWSRLQW^±^RXWVLGH^DLU^WHPSHUDWXUH^^^ ǻ2$7^UHSUHVHQWV^WKH^GLIIHUHQFH^EHWZHHQ^RXWVLGH^DLU^WHPSHUDW XUH^DQG^WKH^WHPSHUDWXUH^RI^ZDWHU^WKDW^ is entering the building, indicating an environmental effect on the load. Each of these delta values encodes feedback from the mechanical system, which incorporates feedback into the chiller water setpoint action. Specifically, return water temperature encodes information from environmental states and everything that goes on within the building – occupancy, global demand, what is and isn’t running, time of day, etc. In essence, when training WKH^PRGHO^^WKHUH^LV^UHGXQGDQF\^LQ^VWDWHV^±^H^J^^^ǻ2$7^DQG^ ǻ5:7^RYHUODS^VRPH^RI^WKH^LQIRUPDWLRQ^ they encode. Neural networks are known to be robust in the face of overparameterization, and so neural networks still perform well when multiple inputs encode the same information. In contrast, linear model regression brains are not effective when dealing with RYHUSDUDPHWHUL]DWLRQ^^ 7KLV^ LV^ DQRWKHU^ UHDVRQ^ WKDW^ ǻ5:7^ DQG^ RWKHU^ IHHGEDFN^ VWDWHV^ DUH^ QRW^ included as parameters when constructing a regression brain, as discussed below in conjunction with FIG. 6. Instead, when training a regression brain, states that are not overparameterized are selected – i.e., states that do not both encode the same information. For instance, environmental states that are being measured, such as outside air temperature and wet bulb temperature, do not encode the same information, and so both of these states may be selected when training a regression brain. An incidental benefit of omitting overparameterized states from the regression brain is that is simplifies use of the regression brain by the operator. Absolute to relative adaptation engine 220 generates delta values 224 and 232 from two or more of entries included in states 210. For example, absolute to relative adaptation engine 220 may DSSO\^DQ^HTXDWLRQ^WR^FRPSXWH^ǻ5:7^IURP^D^FKLOOHG^ZDWHU^WHPS HUDWXUH^VHWSRLQW^^^^^DQG^D^UHWXUQ^ ZDWHU^WHPSHUDWXUH^^^^^^$V^LOOXVWUDWHG^^ǻ5:7^ ^FKLOOHG^ZDWHU^WHPSHUDWXUH^VHWSRLQW^^^^^±^FKLOOHU^ return water temperature 124. Chiller return water temperature 124 naturally varies based on load 108 of building 102. Therefore, even if chiller water temperature setpoint 122 remains fixed while collecting states 210, incorporating chiller return water temperature 124 produces a delta return water temperature 224 that varies across the training data. This variation allows a more complete exploration of the possible outputs of a machine learning model. Similarly, absolute relative adaptation engine 220 combines chiller water temperature setpoint 122 and outside air temperature 132 to generate delta outside air temperature 232. As illustrated, ǻ2$7^ ^FKLOOHG^ZDWHU^WHPSHUDWXUH^VHWSRLQW^^^^^±^RXWVLGH^DLU^WHPSH UDWXUH^^^^^^6LQFH^RXWVLGH^DLU^ temperature 132 also tends to vary over time, delta outside air temperature 232 will tend to vary even if chilled water temperature setpoint 122 remains fixed. FIG. 3 illustrates training a machine learning model using states 210 and delta actions 224 and 232. Model training engine 310 trains machine learning model 330 using reinforcement learning function 320. Specifically, one or more of states 210, delta return water temperature 224, and delta outside air temperature 232 are provided as inputs to machine learning model 330 during training. If additional setpoints are being controlled, then additional delta actions may be provided. Reinforcement learning function 320 may train machine learning model 330 to generate a delta action based on a goal, such as minimizing power consumption required to satisfy the load 108. In some configurations, since machine learning model 330 has not been provided with absolute set points it cannot be trained to produce absolute setpoints, but instead is trained to produce delta setpoint actions which are not immediately usable to control cooling system 100. FIG.4 illustrates using the trained machine learning model 330 to generate delta action predictions 420. Specifically, machine learning model 330 is used to infer delta chilled water setpoint actions 420 from sets of cooling system states 410. In some configurations, machine learning model 330 is used to generate a regression brain that maybe employed on-site to set supervisory setpoints of cooling system 100. In order to generate a regression brain, a corpus of sets of cooling system state data 410 are used to infer corresponding delta chilled water setpoint actions 420. As discussed above in conjunction with FIG. 3, the delta chilled water setpoint actions 420 are not immediately usable to control an aspect of cooling system 100. FIG.5 illustrates using a method of moments technique to convert delta actions 420 into absolute actions 522. The trained model 330 is a characterization of the states and actions 210. A subset of WKH^DFWLRQV^JHQHUDWHG^E\^WUDLQHG^PRGHO^^^^^DUH^GHOWD^DFWLRQV ^^^^^^H^J^^^ǻ2$7^^7KHVH^GHOWD^DFWLRQV^ 420 are typically the result of some computation, such as “temperature setpoint – outside air temperature”, and as such do not represent a real-world value that can be acted upon. For example, LI^D^WHPSHUDWXUH^VHWSRLQW^KDV^EHHQ^UHSODFHG^ZLWK^ǻ2$7^^WKH^ PRGHO^ZLOO^OHDUQ^WR^SUHGLFW^ǻ2$7^ values that best satisfy the reinforcement learning function. But it is meaningless and potentially KDUPIXO^ WR^ WHOO^ DQ^+9$&^V\VWHP^ WR^ VHW^D^ WHPSHUDWXUH^ VHWSRLQW^ WR^D^ǻ2$7^YDOXH^±^ WKH^+9$&^ system instead understands absolute setpoint temperature values. As such, before they can be used to control the mechanical system, delta actions must be converted back to absolute actions. This process may be referred to as “undeltaing the delta”. In some configurations, the method of moments technique leverages the underlying physics and thermodynamics of the system to “undelta” the deltas. In the example of a chiller-based HVAC V\VWHP^^PHWKRG^RI^PRPHQWV^EDODQFHV^ǻ5:7^DQG^ǻ2$7^^DUULYLQJ ^DW^WKH^IROORZLQJ^H[SUHVVLRQ^IRU^ converting a delta action into an absolute action: ^^^^5HFRPPHQGHG^VHWSRLQW^ ^ǻ5:7^^^^VHWSRLQW^DFWLRQ^UDQJH^^^ǻ5:7^^^ǻ2$7^^^^^^^^ As discussed below in conjunction with FIG.6, a regression brain may be generated using a corpus of delta actions that have been converted into absolute actions with this equation. As discussed above, the regression brain may be deployed to the mechanical system in order to set supervisory setpoints. For example, the regression brain may map states, such as outside air temperature, load, pump pressure, etc., to actions usable to modify supervisory setpoints. FIG. 6A illustrates generating an AI agent, such as regression brain 620, based on sets of cooling system states 410 and corresponding absolute chiller water setpoint actions 522. In some configurations, trained model 330 maps states to actions. However, it is not always practical or even possible to deploy trained model 330 into the field. Inadequate computing power, an inability to integrate with existing mechanical systems, unacceptably high computational latency, and other limitations may prevent trained model 330 from being deployed on-site to control a mechanical system. Instead, trained model 330 may be used to create an AI agent, such as a “regression brain” – a linear equation that, given a subset of substates 212 of the mechanical system 100, returns an action with which to control the system. In some configurations, each of the sets of cooling system states 410 was used to infer, with machine learning model 330, one of the delta actions 420. This delta action was then “de-delta- ed” with a method of moments technique to produce one of the absolute chiller water setpoint actions 522. Regression brain engine 610 may perform a regression analysis on sets of cooling system states 410 and corresponding absolute chiller water setpoint actions 522 to determine regression brain 620 that, for a subset of substates 212, yields a stochastic chiller water setpoint action. In some configurations, not all of the substates 212 stored in each of the sets of cooling system states 410 are used to generate the regression brain 620. For example, substates may be omitted due to redundancy with other states or for representing the same information that is already encoded in other states – also referred to as overparameterization. In one configuration, substates that are not obtained directly from a sensor or other measuring device are excluded when performing the regression analysis. As illustrated, select substates 612 are used to determine regression brain 620. Specifically, Outside Air Temperature state 612A, Chiller Return Water Temperature state 612B, and Load state 612C are analyzed to generate a regression brain 620 with Outside Air Temperature coefficient 624A, Chiller Return Water Temperature coefficient 624B, load coefficient 624C, and constant 626. Other substates 212 that were used to train machine learning model 330, such as condenser return water temperature 114, are not included when generating regression brain 620. One real world example of the resulting regression equation of regression brain 620 is “2005.6018066 - .181145571 * state.OAT - .015137211 * state.RWT - .005379316 * state.Load.”, where 2005.6018066 is constant 626, -.181145571 is Outside Air Temperature coefficient 624A, -.015137211 is Return Water Temperature coefficient 624B, and -.005379316 is Load coefficient 624C. And where “state.OAT” refers to an individual outside air temperature 132, often sampled in real-time from a sensor near mechanical system 100. Similarly, “state.RWT” refers to a measure of return water temperature 124, and state.Load refers to a measure of load 108. In some configurations, in order to compute a setpoint action 622, regression brain 620 multiplies each coefficient 624 by a corresponding input parameter, adds the results, and optionally adds constant value 626. The resulting value is stochastic setpoint action 640 that may be used to configure an aspect of cooling system 100. FIG. 6B illustrates using the regression brain to constrain an AI agent. For example, outside air temperature 632 is a real-world reading of the temperature around cooling system 100. Outside air temperature 632 is multiplied by OAT coefficient 624A. Similarly, return water temperature 634 is multiplied by return water temperature coefficient 624B, and load 636 is multiplied by load coefficient 624C. The result may be added to constant 626, and the final value is a setpoint action 640 that can control an aspect of cooling device 100. Under certain conditions, setpoint action 640 is a stochastic action – an action that is subject to uncontrollable randomness. In some configurations, stochasticity mitigation engine 650 smooths out random actions and prevents actions from putting mechanical system 100 into an invalid state. Stochasticity mitigation engine 650 may be deployed with regression brain 620 on-site, where it may intercept and modify stochastic action 640. As discussed above, a regression brain 620 is generated for each action that may be used to configure mechanical system 100. While Absolute chiller water temperature setpoint actions 522 are depicted in FIG.6B, other setpoint actions such as condenser water temperature setpoint actions, differential pressure setpoints, pump gallons-per- minute setpoints, and the like may all be predicted by a different regression brain 620. In some configurations, a different stochasticity mitigation engine 650 is created to manage the outputs of each of these different regression brains 620. However, regression brains 620 may also share the same stochasticity mitigation engine 650. Stochasticity mitigation engine 650 is provided with stochastic action 640 that was generated by regression brain 620. The “stochastic” label indicates that the action may contain a value that is sub-optimal if not deleterious to the functioning of mechanical system 100. Stochasticity mitigation engine 650 implements a “margin” for stochastic action 640, in that it effectively provides a buffer that prevents inefficient, invalid, or otherwise harmful operation of mechanical system 100. Stochasticity mitigation engine 650 may use a number of techniques based on the type of action 640 that regression brain 620 generates. For example, if regression brain 620 may be capable of producing a stochastic action 640 with an invalid value, e.g. a value that could cause the mechanical system to stop functioning or damage itself, clamp 652 may be applied to ensure the value encoded by stochastic action 640 falls within an acceptable range. For example, cooling system 100 may only support chiller water temperatures between 40° and 55°, and so stochasticity mitigation engine 650 may apply two clamps – one that returns the maximum of the encoded value and 40°, and another that returns the minimum of the encoded value and 55°. This has the effect of limiting the setpoint to a range of valid values. Averaging 654 is a technique for smoothing out erratic or unpredictable data. For example, a person in an office may adjust their thermostat up and down based on the time of day, the clothes they are wearing, or any number of other factors. Averaging these values reduces the unpredictability, improving efficiency by reducing the rate of change and the amplitude of the change. Window averaging 656 applies a similar concept, but averaging is only performed over a defined lookback period, i.e., the window. Window averaging 656 smooths out values, but avoids converging on an average value that is based on an entire lifetime’s worth of values. Statistical process control 658 is a method of quality control which employs statistical methods to monitor and control a process. This helps to ensure that the process operates efficiently, producing more specification-conforming products with less waste. After applying the appropriate mitigation techniques, stochasticity mitigation engine 650 may provide modified action 652 to mechanical system 100 or the appropriate component thereof. For example, modified action 652 may be provided to chiller 106 in order to set the chiller water temperature setpoint 122. While FIG. 6B illustrates applying stochasticity mitigation engine 650 in the context of a regression brain that was created with data generated by a machine learning model and by applying a method of moments calculation, mitigation engine 650 may be applied to any type of action that controls setpoints of other configuration points of mechanical system 100. FIG. 7 shows changes in the power usage by the cooling system 100 in accordance with one example. As explained earlier, the various components of the cooling system 100 may consume power based on the cooling load 108 being serviced by the cooling system 100. The power usage 702 may be based on the recommended setpoints 710, 720, 730, 740 by regression brain 620. Portion 710 corresponds to any increases or decreases in the power usage by the cooling tower (e.g., cooling tower 104). In this example, arrow 714 shows that the power usage by the cooling tower 104 is decreased when the condenser water temperature setpoint (CDW SWS) is increased. Portion 720 corresponds to the power usage by the cooling pump (e.g., CWP 110). In this example, arrow 724 shows that the power usage by the cooling pump is increased when the chilled water flow GPM STPT is increased. Portion 730 corresponds to the power usage by the chiller (e.g., chiller 106). In this example, arrow 734 shows that the power usage by chiller 106 is decreased as a result in the increase of the chiller water temperature setpoint (CHW SWS). Portion 740 corresponds to the power usage by the chilled water pressure differential. In this example, arrow 744 shows that the power usage by the chilled water pump (e.g., CHP 120) is increased when the differential pressure (DPSP) is increased. These changes in the power usage by the various components of the cooling system 100, however, result in a decrease in the net power usage of the cooling system 100 while still meeting the cooling load 108 requirements. Thus, the predicted setpoints by the autonomous supervisory control system result in efficiency gains. FIG. 8 illustrates temporal segmentation of margins. Mechanical systems they often behave in unpredictable ways. For example, overtime, wear and tear on mechanical system 100 – also referred to as “drift” - may lead to a change in performance characteristics. On a shorter time horizon, variation in user input and the randomness inherent to any computing device may also affect the operation of mechanical system 100. Whatever the cause and whatever the time frame, stochasticity limits efficiency, and can lead to an untimely failure of mechanical system 100. Existing techniques anticipate stochasticity by building mechanical systems in anticipation of wear and tear, and in anticipation of short-term variation in usage. However, the additional cost to design and implement these precautions may be significant. In some configurations, stochasticity is segmented temporally into long term and short term stochasticity. Long term stochasticity may occur on a time frame 802 of years 806. For example, normal wear and tear on aging equipment 822 may cause an expected degradation of performance over time. For example, an automobile engine may experience reduced compression as cylinders and piston rings wear out. Long margins 820 may be developed to mitigate this damage. Because of the long time horizon, individual random events tend to have a cumulative effect that is less random. As such, long margins 820 may be implemented by adjusting the reinforcement learning function used to train model 330. Short term stochasticity maybe observed on a time scale of minutes, seconds or less. As such it is fundamentally different from long term stochasticity in that the impact of the randomness is felt immediately, without the accumulation overtime of other random events that may have the effect of canceling each other out. for example, short term stochasticity may be the result of AI agent artifacts 812. For example, a regression brain 620 that has been created based on data collected from mechanical system 100 may use a linear equation to map a defined set of substates 212 of state 210 to an agent. By its nature, a linear regression will be closer to an optimal value for some inputs than for others. In this way, a linear regression may produce values that are incompatible with the system requirements of mechanical system 100. Signal randomness 814 as another example of short term stochasticity that may affect how an AI agent 620 controls setpoints 122 of mechanical system 100. Signal randomness 814 maybe caused by hardware or software delays, and it is not always possible to eliminate these delays completely. For example, signal randomness 814 may affect the order of operations performed by mechanical system 100. Short margins 810 may be employed to mitigate the effects of short term stochasticity 812-814. As discussed above in conjunction with FIG. 6, short margins may apply a clamp 652, window average 656, or other strategy for preventing stochastic action 640 from eroding the efficiency of mechanical system 100 and/or causing damage to mechanical system 100. Turning now to FIG. 9, aspects of a routine for mitigating stochasticity in AI agents that is used to control a mechanical system is shown and described. For ease of understanding, the processes discussed in this disclosure are delineated as separate operations represented as independent blocks. However, these separately delineated operations should not be construed as necessarily order dependent in their performance. The order in which the process is described is not intended to be construed as a limitation, and any number of the described process blocks may be combined in any order to implement the process or an alternate process. Moreover, it is also possible that one or more of the provided operations is modified or omitted. With reference to FIG. 9, routine 900 begins at operation 902, where a machine learning model 330 is trained based on state 210 collected from mechanical system 100 during operation of mechanical system 100. In some configurations, the machine learning model 330 is trained with delta actions 420 that are generated from state 210 to ensure model 330 is trained with values that fluctuate throughout the training data. Machine learning model 330 is trained to predict an action – e.g., to set a setpoint of mechanical system 100 – from a plurality of types of states 410. Next at operation 904, the machine learning model 330 is used to infer a plurality of actions 522 from a plurality of sets of state 410. Next at operation 906, an AI agent 620 is generated based on the plurality of sets of states 410 and the plurality of actions 522. For example, a regression brain is generated by fitting a regression line to a subset of the type of states contained in states 210. For example, chiller water temperature setpoint 122 may be included in the subset of types of states while condenser return water temperature 114 may be omitted. In this case, the regression line would be fit to the chiller water temperature setpoint 122 but not the condenser return water temperature 114. Next at operation 908, the AI agent 620 is deployed with stochasticity mitigation engine 650. Stochasticity mitigation engine 650 may be the regression brain discussed above. Once deployed, stochasticity mitigation engine 650 produces an individual action for an individual state. The particular implementation of the technologies disclosed herein is a matter of choice dependent on the performance and other requirements of a computing device. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These states, operations, structural devices, acts, and modules can be implemented in hardware, software, firmware, in special-purpose digital logic, and any combination thereof. It should be appreciated that more or fewer operations can be performed than shown in the figures and described herein. These operations can also be performed in a different order than those described herein. It also should be understood that the illustrated methods can end at any time and need not be performed in their entireties. Some or all operations of the methods, and/or substantially equivalent operations, can be performed by execution of computer-readable instructions included on a computer-storage media, as defined below. The term “computer-readable instructions,” and variants thereof, as used in the description and claims, is used expansively herein to include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like. Thus, it should be appreciated that the logical operations described herein are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. For example, the operations of the routine 900 are described herein as being implemented, at least in part, by modules running the features disclosed herein can be a dynamically linked library (DLL), a statically linked library, functionality produced by an application programing interface (API), a compiled program, an interpreted program, a script or any other executable set of instructions. Data can be stored in a data structure in one or more memory components. Data can be retrieved from the data structure by addressing links or references to the data structure. Although the following illustration refers to the components of the figures, it should be appreciated that the operations of the routine 900 may be also implemented in many other ways. For example, the routine 900 may be implemented, at least in part, by a processor of another remote computer or a local circuit. In addition, one or more of the operations of the routine 900 may alternatively or additionally be implemented, at least in part, by a chipset working alone or in conjunction with other software modules. In the example described below, one or more modules of a computing system can receive and/or process the data disclosed herein. Any service, circuit or application suitable for providing the techniques disclosed herein can be used in operations described herein. FIG. 10 shows additional details of an example computer architecture 1000 for a device, such as a computer or a server configured as part of the systems described herein, capable of executing computer instructions (e.g., a module or a program component described herein). The computer architecture 1000 illustrated in FIG. 10 includes processing unit(s) 1002, a system memory 1004, including a random-access memory 1006 (“RAM”) and a read-only memory (“ROM”) 1008, and a system bus 1010 that couples the memory 1004 to the processing unit(s) 1002. Processing unit(s), such as processing unit(s) 1002, can represent, for example, a CPU-type processing unit, a GPU-type processing unit, a field-programmable gate array (FPGA), another class of digital signal processor (DSP), or other hardware logic components that may, in some instances, be driven by a CPU. For example, and without limitation, illustrative types of hardware logic components that can be used include Application-Specific Integrated Circuits (ASICs), Application-Specific Standard Products (ASSPs), System-on-a-Chip Systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc. A basic input/output system containing the basic routines that help to transfer information between elements within the computer architecture 1000, such as during startup, is stored in the ROM 1008. The computer architecture 1000 further includes a mass storage device 1012 for storing an operating system 1014, application(s) 1016, modules 1018, and other data described herein. The mass storage device 1012 is connected to processing unit(s) 1002 through a mass storage controller connected to the bus 1010. The mass storage device 1012 and its associated computer- readable media provide non-volatile storage for the computer architecture 1000. Although the description of computer-readable media contained herein refers to a mass storage device, it should be appreciated by those skilled in the art that computer-readable media can be any available computer-readable storage media or communication media that can be accessed by the computer architecture 1000. Computer-readable media can include computer-readable storage media and/or communication media. Computer-readable storage media can include one or more of volatile memory, nonvolatile memory, and/or other persistent and/or auxiliary computer storage media, removable and non- removable computer storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Thus, computer storage media includes tangible and/or physical forms of media included in a device and/or hardware component that is part of a device or external to a device, including but not limited to random access memory (RAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), phase change memory (PCM), read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read- only memory (EEPROM), flash memory, compact disc read-only memory (CD-ROM), digital versatile disks (DVDs), optical cards or other optical storage media, magnetic cassettes, magnetic tape, magnetic disk storage, magnetic cards or other magnetic storage devices or media, solid- state memory devices, storage arrays, network attached storage, storage area networks, hosted computer storage or any other storage memory, storage device, and/or storage medium that can be used to store and maintain information for access by a computing device. In contrast to computer-readable storage media, communication media can embody computer- readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer storage media does not include communication media. That is, computer-readable storage media does not include communications media consisting solely of a modulated data signal, a carrier wave, or a propagated signal, per se. According to various configurations, the computer architecture 1000 may operate in a networked environment using logical connections to remote computers through the network 1020. The computer architecture 1000 may connect to the network 1020 through a network interface unit 1022 connected to the bus 1010. The computer architecture 1000 also may include an input/output controller 1024 for receiving and processing input from a number of other devices, including a keyboard, mouse, touch, or electronic stylus or pen. Similarly, the input/output controller 1024 may provide output to a display screen, a printer, or other type of output device. It should be appreciated that the software components described herein may, when loaded into the processing unit(s) 1002 and executed, transform the processing unit(s) 1002 and the overall computer architecture 1000 from a general-purpose computing system into a special-purpose computing system customized to facilitate the functionality presented herein. The processing unit(s) 1002 may be constructed from any number of transistors or other discrete circuit elements, which may individually or collectively assume any number of states. More specifically, the processing unit(s) 1002 may operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein. These computer-executable instructions may transform the processing unit(s) 1002 by specifying how the processing unit(s) 1002 transition between states, thereby transforming the transistors or other discrete hardware elements constituting the processing unit(s) 1002. FIG. 11 depicts an illustrative distributed computing environment 1100 capable of executing the software components described herein. Thus, the distributed computing environment 1100 illustrated in FIG. 11 can be utilized to execute any aspects of the software components presented herein. For example, the distributed computing environment 1100 can be utilized to execute aspects of the software components described herein. Accordingly, the distributed computing environment 1100 can include a computing environment 1102 operating on, in communication with, or as part of the network 1104. The network 1104 can include various access networks. One or more client devices 1106A-1106N (hereinafter referred to collectively and/or generically as “clients 1106” and also referred to herein as computing devices 1106) can communicate with the computing environment 1102 via the network 1104. In one illustrated configuration, the clients 1106 include a computing device 1106A such as a laptop computer, a desktop computer, or other computing device; a slate or tablet computing device (“tablet computing device”) 1106B; a mobile computing device 1106C such as a mobile telephone, a smart phone, or other mobile computing device; a server computer 1106D; and/or other devices 1106N. It should be understood that any number of clients 1106 can communicate with the computing environment 1102. In various examples, the computing environment 1102 includes servers 1108, data storage 1110, and one or more network interfaces 1112. The servers 1108 can host various services, virtual machines, portals, and/or other resources. In the illustrated configuration, the servers 1108 host virtual machines 1114, Web portals 1116, mailbox services 1118, storage services 1120, and/or, social networking services 1122. As shown in FIG. 11 the servers 1108 also can host other services, applications, portals, and/or other resources (“other resources”) 1124. As mentioned above, the computing environment 1102 can include the data storage 1110. According to various implementations, the functionality of the data storage 1110 is provided by one or more databases operating on, or in communication with, the network 1104. The functionality of the data storage 1110 also can be provided by one or more servers configured to host data for the computing environment 1102. The data storage 1110 can include, host, or provide one or more real or virtual datastores 1126A-1126N (hereinafter referred to collectively and/or generically as “datastores 1126”). The datastores 1126 are configured to host data used or created by the servers 1108 and/or other data. That is, the datastores 1126 also can host or store web page documents, word documents, presentation documents, data structures, algorithms for execution by a recommendation engine, and/or other data utilized by any application program. Aspects of the datastores 1126 may be associated with a service for storing files. The computing environment 1102 can communicate with, or be accessed by, the network interfaces 1112. The network interfaces 1112 can include various types of network hardware and software for supporting communications between two or more computing devices including, but not limited to, the computing devices and the servers. It should be appreciated that the network interfaces 1112 also may be utilized to connect to other types of networks and/or computer systems. It should be understood that the distributed computing environment 1100 described herein can provide any aspects of the software elements described herein with any number of virtual computing resources and/or other distributed computing functionality that can be configured to execute any aspects of the software components disclosed herein. According to various implementations of the concepts and technologies disclosed herein, the distributed computing environment 1100 provides the software functionality described herein as a service to the computing devices. It should be understood that the computing devices can include real or virtual machines including, but not limited to, server computers, web servers, personal computers, mobile computing devices, smart phones, and/or other devices. As such, various configurations of the concepts and technologies disclosed herein enable any device configured to access the distributed computing environment 1100 to utilize the functionality described herein for providing the techniques disclosed herein, among other aspects. The present disclosure is supplemented by the following example clauses. Example 1: A method comprising: training a machine learning model based on states collected from a mechanical system; using the machine learning model to infer a plurality of actions from a plurality of sets of states; generating an AI agent based on the plurality of sets of states and the plurality of actions; deploying the AI agent with a stochasticity mitigation engine configured to modify an output of the AI agent; wherein the AI agent computes an individual action from an individual state, and wherein the stochasticity mitigation engine generates a modified individual action derived from the individual action that controls the mechanical system. Example 2: The method of example 1, wherein the stochasticity mitigation engine applies a clamp to a value included in the individual action. Example 3: The method of example 1, wherein the individual action controls a setpoint of the mechanical system. Example 4: The method of example 1, wherein the machine learning model is trained with a delta action that is generated by combining a fixed state included in the states with a fluctuating state included in the states. Example 5: The method of example 4, wherein the plurality of actions is inferred in part by applying a method of moments computation to convert the delta action into an absolute action that can be used directly to control a setpoint of the mechanical system. Example 6: The method of example 1, wherein the AI agent comprises a regression brain that maps the individual state to the individual action, wherein the regression brain is generated by: selecting one or more types of states; and computing a regression line that maps states of the one or more types of states to the individual action. Example 7: The method of example 1, the stochasticity mitigation engine addresses near term stochasticity. Example 8: The method of example 7, wherein long term stochasticity is addressed by adjusting a loss function or a reinforcement learning function of the model to account for expected wear over time of the mechanical system. Example 9: A computing device comprising: one or more processors; a memory in communication with the one or more processors, the memory having computer-readable instructions stored thereupon which, when executed by the one or more processors, cause the computing device to: receive an AI agent that maps a state of a mechanical system to an action; receive a stochasticity mitigation engine; provide the action to the stochasticity mitigation engine, wherein the stochasticity mitigation engine generates a modified action by modifying the action according to a defined set of parameters of the mechanical system; and apply the modified action to control an aspect of the mechanical system. Example 10: The computing device of example 9, wherein the stochasticity mitigation engine modifies the action by applying a window averaging technique that smooths out a stream of values. Example 11: The computing device of example 9, wherein the AI agent is generated based on a plurality of states and a corresponding plurality of actions, wherein the plurality of actions are inferred from the plurality of states by a machine learning model. Example 12: The computing device of example 11, wherein the AI agent comprises a regression brain that maps substates of an individual state to an individual action, wherein the regression brain is generated by fitting a regression line to the substates. Example 13: The computing device of example 12, wherein the action includes a value that exceeds the defined set of parameters of the mechanical system because the regression brain extrapolated beyond a threshold value. Example 14: The computing device of example 9, wherein the modified action is applied to set a setpoint of the mechanical system. Example 15: The computing device of example 9, wherein the mechanical system comprises a cooling system, wherein the state of the mechanical system comprises a return water temperature, and wherein the action sets a chiller water temperature setpoint. Example 16: A computer-readable storage medium comprising instructions that, when executed by a processor, cause the processor to: train a machine learning model based on states collected from a mechanical system; use the machine learning model to infer a plurality of actions from a plurality of sets of states; generate an AI agent based on the plurality of sets of states and the plurality of chiller water setpoint actions; deploy the AI agent with a stochasticity mitigation engine configured to modify an output of the AI agent; wherein the AI agent computes an individual action from a subset of substates of an individual state, and wherein the stochasticity mitigation engine generates a modified individual action derived from the individual action that controls the mechanical system. Example 17: The computer-readable storage medium of example 16, wherein the mechanical system comprises a cooler system. Example 18: The computer-readable storage medium of example 17, wherein the subset of substates comprises an outside air temperature, a wet bulb temperature, and a load. Example 19: The computer-readable storage medium of example 17, wherein the individual action sets a supervisory setpoint that comprises a condenser water setpoint, a condenser water pump flow rate, a chilled water setpoint, or a chilled water pump flow rate. Example 20: The computer-readable storage medium of example 16, wherein the individual state originates from sensor data, and wherein the individual action controls the mechanical system by setting a supervisory setpoint. While certain example embodiments have been described, these embodiments have been presented by way of example only and are not intended to limit the scope of the inventions disclosed herein. Thus, nothing in the foregoing description is intended to imply that any particular feature, characteristic, step, module, or block is necessary or indispensable. Indeed, the novel methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the inventions disclosed herein. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of certain of the inventions disclosed herein. It should be appreciated that any reference to “first,” “second,” etc. elements within the Summary and/or Detailed Description is not intended to and should not be construed to necessarily correspond to any reference of “first,” “second,” etc. elements of the claims. Rather, any use of “first” and “second” within the Summary, Detailed Description, and/or claims may be used to distinguish between two different instances of the same element. In closing, although the various techniques have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended representations is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claimed subject matter.