Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EXPLICIT RULE-BASED CONTROL OF COMPLEX DYNAMICAL SYSTEMS
Document Type and Number:
WIPO Patent Application WO/2023/022712
Kind Code:
A1
Abstract:
A method for configuring a controller of a dynamical system includes obtaining a control data manifold formed by a plurality of stored control points, each representative of a state signal specifying a state of the dynamical system and an assigned control signal. Each state signal is mapped to a multi-dimensional state space. The assigned control signal is generated by a first control algorithm as a function of the state signal. The method includes detecting patches on the control data manifold by identifying control points on the control data manifold that belong to a common local approximation function, and training a classifier to classify control points into different patches. The method further includes training a respective regression model for each detected patch for approximating a relationship between state signals and the control signals in that patch, to create an explicit rule-based control algorithm.

Inventors:
HARTMANN DIRK (DE)
PANDEY AMIT (US)
GUMUSSOY SUAT (US)
MUENZ ULRICH (US)
Application Number:
PCT/US2021/046463
Publication Date:
February 23, 2023
Filing Date:
August 18, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SIEMENS AG (DE)
SIEMENS CORP (US)
International Classes:
G05B19/042; G05B13/04
Foreign References:
US20150234779A12015-08-20
US20120010757A12012-01-12
US20040249483A12004-12-09
Attorney, Agent or Firm:
BASU, Rana (US)
Download PDF:
Claims:
CLAIMS

What is claimed is:

1. A method for configuring a controller of a dynamical system, comprising: reading a plurality of state signals, each state signal specifying a state of the dynamical system and being mapped to a multi-dimensional state space, using a first control algorithm to determine, for each state signal, a control signal that is assigned to that state signal, wherein each state signal and the assigned control signal represents a respective control point in a control data manifold pertaining to the dynamical system, detecting patches on the control data manifold by identifying control points on the control data manifold that belong to a common local approximation function, training a classifier to classify control points into different patches from among the detected patches, training a respective regression model for each detected patch for approximating a relationship between the state signals and the control signals in that patch, and using the trained classifier and regression models to create an explicit rule-based control algorithm, which is configured to convert a measured state signal obtained from the dynamical system into a control action by identifying an active patch as a function of the measured state signal and evaluating the respective regression model for the identified active patch.

2. The method according to claim 1, wherein the state signals and the control signals represent time series data, wherein for each time step, a respective control signal is generated by the first control algorithm based on an updated state signal for that time step, and wherein time series data pertaining to state signals and the control signals are generated for a variety of initial states and scenario parameters of the dynamical system.

3. The method according to any of claims 1 and 2, comprising simulatively generating state signals and the control signals based on an interaction of the first control algorithm with a simulation model of the dynamical system.

4. The method according to claim 3, wherein the first control algorithm comprises a model predictive control (MPC) algorithm, wherein the method comprises: reading state signals from the simulation model, using the MPC algorithm to determine, for each state signal, a plurality of variants of a control signal, using the simulation model to simulate a behavior of the dynamical system for each of the variants of the control signal over a defined prediction horizon, and assigning one of the variants of the control signal to the respective state signal that results in an optimized behavior of the dynamical system.

5. The method according to any of claims 1 to 3, wherein the first control algorithm comprises a neural network based policy trained to map state signals to assigned control signals.

6. The method according to any of claims 1 to 5, comprising generating state signals and the control signals by performing a plurality of experiments involving interaction of an on-field controller executing the first control algorithm with the dynamical system.

7. The method according to any of claims 1 to 6, wherein detecting patches on the control data manifold comprises an unsupervised process of: sampling control points in the control data manifold, for each sampled control point, determining a local approximation function describing a patch formed by a neighborhood of control points associated with that sampled control point, and labeling the determined local approximation function with an existing patch label or a new patch label based on a similarity with stored local approximation functions representative of other patches.

8. The method according to any of claims 1 to 7, wherein each patch comprises a hyperplane in a space where the control data manifold is embedded.

9. The method according to claim 8, wherein the respective regression models each comprises a linear regression model.

10. The method according to any of claims 8 and 9, wherein the control data manifold comprises a non-linear region, wherein detecting patches on the control data manifold comprises fitting a plurality of hyperplanes on the non-linear region.

11. The method according to any of claims 1 to 7, wherein the control data manifold comprises a non-linear region, wherein detecting patches on the control data manifold comprises fitting the non-linear region with one or more patches defined by polynomial local approximation functions.

12. The method according to claim 11, wherein the regression models trained for the one or more patches comprise respective polynomial regression models.

13. The method according to any of claims 1 to 12, comprising training a corresponding support vector machine (SVM) for each patch to classify control points into different patches, wherein the active patch is identifiable by using the measured state signal to evaluate an SVM indicator function associated with individual patches.

14. The method according to any of claims 1 to 13, wherein the explicit rule-based control algorithm is created in a remote computing environment with respect to the controller of the dynamical system where the explicit rule-based control algorithm is subsequently deployed.

15. A non-transitory computer-readable storage medium including instructions that, when processed by a computing device, configure the computing device to perform the method according to any of claims 1 to 14.

19

16. A controller for a dynamical system, comprising: a processor; and a memory storing a computer program incorporating an explicit rule-based control algorithm, which, when executed by the processor, configures the controller to: receive a measured state signal from the dynamical system, identify, as a function of the measured state signal, an active patch out of multiple patches on a control data manifold pertaining to the dynamical system, using a trained control point classifier, the control data manifold being formed by a plurality of stored control points, each control point representative of a state signal specifying a state of the dynamical system and a control signal assigned to that state signal, each state signal being mapped to a multidimensional state space, each patch being defined by control points on the control data manifold that belong to a common local approximation function, and execute a control action as a function of the measured state signal by evaluating a trained regression model associated with the identified patch, the regression model being trained to approximate a relationship between the state signals and the control signals in the identified patch.

17. A method for controlling a dynamical system, comprising: creating an explicit rule-based control algorithm by: reading a plurality of state signals, each state signal specifying a state of the dynamical system and being mapped to a multi-dimensional state space, using a first control algorithm to determine, for each state signal, a control signal that is assigned to that state signal, wherein each state signal and the assigned control signal represents a respective control point in a control data manifold pertaining to the dynamical system, detecting patches on the control data manifold by identifying control points on the control data manifold that belong to a common local approximation function, training a classifier to classify control points into different patches from among the detected patches, training a respective regression model for each detected patch for approximating a relationship between the state signals and the control signals in that patch, and using the explicit rule-based control algorithm to control the dynamical system by:

20 receiving a measured state signal from the dynamical system, identifying an active patch as a function of the measured state signal, and executing a control action as a function of the measured state signal by evaluating the respective regression model for the identified active patch.

18. The method according to claim 17, wherein a corresponding support vector machine (SVM) is trained for each patch to classify control points into different patches, the method comprising identifying the active patch by using the measured state signal to evaluate an SVM indicator associated with individual patches.

19. The method according to any of claims 17 and 18, comprising creating the explicit rule-based control algorithm in an offline process and subsequently transferring the explicit rule-based control algorithm to a memory of a controller to control the dynamical system.

20. The method according to claim 19, wherein the creating of the explicit rule-based control algorithm is executed in a cloud computing environment.

21

Description:
EXPLICIT RULE-BASED CONTROL OF COMPLEX DYNAMICAL SYSTEMS

TECHNICAL FIELD

[0001] The present disclosure relates generally to controllers for controlling complex dynamical systems, and in particular, to a technique for generating an explicit rule-based control for a dynamical system.

BACKGROUND

[0002] For the control of complex systems, such as electric grids, building systems, plants, automotive vehicles, factory robots, etc., complex controls are necessary. While it is possible to synthesize sophisticated controls for controlling such complex systems, such as model predictive control (MPC), their implementation is often limited by hardware restrictions.

[0003] With an MPC controller, depending on the input variables, a future behavior of a system to be controlled is simulated in order to determine an output signal (control action) that optimizes the behavior of the system, often with defined constraints. MPC yields optimal solutions in a mathematical sense. However, such a control requires solving a mathematical program online to compute the control action. Linear MPC controllers are state of the art in industry. Considering highly non-linear problems with many constraints or even discrete variables / decisions, the realization of MPC can be quite challenging. MPCs require high computational efforts and mathematical challenges are associated with corresponding non-linear optimization technologies, e.g., non-guarantee of convergence.

[0004] A known approach to address the above-described challenge involves explicit model predictive control (explicit MPC). By solving the optimization problem off-line for a given range of operating conditions of interest and exploiting multiparametric programming techniques, explicit MPC computes the optimal control action off-line as an “explicit” function of the state and reference vectors, so that on-line operations reduce to a simple function evaluation. However, the ‘size’ of these explicit functions increases rapidly with the number of system states and constraints, so that they may become intractable to compute for large, complex systems. [0005] Alternatively, or additionally, a data-driven machine learning model can be trained on the basis of simulations, which then supplements or replaces the online MPC computation. However, a control characteristic conveyed by a trained machine learning model, such as a neural network, is usually hardly analytically comprehensible or interpretable for a user, such as a technician, or an agency tasked to certify the controller, among others. Furthermore, for a neural network to accurately mimic a MPC solution, a large number of hidden nodes may be required, which can require greater computational resources.

SUMMARY

[0006] Briefly, aspects of the present disclosure provide a technique for generating an explicit rulebased control for a dynamical system that addresses at least some of the above-described technical problems.

[0007] A first aspect of the disclosure provides a method for configuring a controller of a dynamical system. The method comprises reading a plurality of state signals, each state signal specifying a state of the dynamical system and being mapped to a multi-dimensional state space. The method further comprises using a first control algorithm to determine, for each state signal, a control signal that is assigned to that state signal. Each state signal and the assigned control signal represents a respective control point in a control data manifold pertaining to the dynamical system. The method further comprises detecting patches on the control data manifold by identifying control points on the control data manifold that belong to a common local approximation function. The method further comprises training a classifier to classify control points into different patches from among the detected patches. The method further comprises training a respective regression model for each detected patch for approximating a relationship between the state signals and the control signals in that patch. The trained classifier and regression models are used to create an explicit rule-based control algorithm, which is configured to convert a measured state signal obtained from the dynamical system into a control action by identifying an active patch as a function of the measured state signal and evaluating the respective regression model for the identified active patch.

[0008] A second aspect of the disclosure provides a method for controlling a dynamical system. The method comprises creating an explicit rule-based control algorithm according to a method described above. The method further comprises using the explicit rule-based control algorithm to control the dynamical system by: receiving a measured state signal from the dynamical system, identifying an active patch as a function of the measured state signal, and executing a control action as a function of the measured state signal by evaluating the respective regression model for the identified active patch.

[0009] Other aspects of the present disclosure implement features of the above-described methods in controllers and computer program products.

[0010] Additional technical features and benefits may be realized through the techniques of the present disclosure. Embodiments and aspects of the disclosure are described in detail herein and are considered a part of the claimed subject matter. For a better understanding, refer to the detailed description and to the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The foregoing and other aspects of the present disclosure are best understood from the following detailed description when read in connection with the accompanying drawings. To easily identify the discussion of any element or act, the most significant digit or digits in a reference number refer to the figure number in which the element or act is first introduced.

[0012] FIG. 1 is a schematic diagram illustrating an example embodiment of a platform for creating an explicit rule- based control algorithm which may be used by a controller for controlling a dynamical system.

[0013] FIG. 2 illustrates a representative visualization of a control data manifold produced by model predictive control of a second order linear system with a constrained input.

[0014] FIG. 3 illustrates a visualization of approximated hyperplanes on the control data manifold shown in FIG. 2.

[0015] FIG. 4 illustrates a controller for controlling a dynamical system using an explicit rulebased control algorithm created in accordance with disclosed embodiments.

[0016] FIG. 5 illustrates an example embodiment of a method for determining a control action by the explicit rule-based control algorithm based on a measured state signal using support vector machines.

[0017] FIG. 6A, 6B and 6C depict operating parameters in connection with an illustrative use case of utilizing the disclosed embodiments for providing an optimal economic dispatch of electricity from a grid.

[0018] FIG. 7 depicts a control signal generated by a model predictive control algorithm for the illustrative use case, with a prediction horizon m=2.

[0019] FIG. 8 depicts a visualization of a control data manifold with approximated hyperplanes for the illustrated use case.

DETAILED DESCRIPTION

[0020] Various technologies that pertain to systems and methods will now be described with reference to the drawings, where like reference numerals represent like elements throughout. The drawings discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged apparatus. It is to be understood that functionality that is described as being carried out by certain system elements may be performed by multiple elements. Similarly, for instance, an element may be configured to perform functionality that is described as being carried out by multiple elements. The numerous innovative teachings of the present application will be described with reference to exemplary non-limiting embodiments.

[0021] It is recognized that many modern control algorithms may be synthesized to work well in simulation environments but are challenging to implement on controllers for online control of a complex dynamical system, due to restrictions and limitations in controller hardware. The disclosed methodology may be used to extract rules from advanced control algorithms which can be implemented in existing controllers, such as programmable logic controllers (PLC) or another type of computing device. Furthermore, the resulting control algorithm is transparent and readily interpretable due to its explicit character, allowing personnel, such as a technician or an official authority, to better understand relationships between controller inputs and outputs. [0022] As mentioned above, in case of traditional explicit MPC, the ‘size’ of the explicit functions increases rapidly with the number of system states and constraints, so that they may become intractable to compute for large, complex systems. Furthermore, often it may not be feasible to reduce the input state space into lower dimensions. The disclosed methodology provides a solution that can be scalable to high dimensional system states by implementing a technique for identifying patches (such as hyperplanes) in the control data manifold and grouping the control data points according to the patches. Control rules are then extracted by training a regression model for each patch, to create an explicit rule-based control algorithm. When deployed on a controller hardware for online control, the explicit rule-based control algorithm may use a simple indicator function to identify an active patch as a function of a measured input state signal and evaluate a corresponding regressor to determine a control action.

[0023] The disclosed methodology may thus provide an efficient mechanism by which hardware controllers can apply control rules and determine control actions for high-dimensional system states, without the stringent computational requirements of MPC, neural networks, or other advanced control algorithms. Through indicator functions that encapsulate patches in control data manifolds and regression models, the technology described herein may provide a computationally elegant methodology to evaluate control actions for complex system states, all the while maintaining requisite accuracy and effectiveness of generated control actions.

[0024] The disclosed methodology does not depend on the type of control algorithm for which rules are to be extracted or the type of controller hardware where the resulting explicit rule-based control algorithm is to be executed. The embodiments described herein suitably address the problem of generating explicit control rules based on model predictive control (MPC), which contains sophisticated optimization problems that can place a substantial computational burden particularly on old or under-powered field controllers. It will however be appreciated that the underlying technique is not limited to MPC or any particular type of control algorithm.

[0025] Referring now to the drawings, FIG. 1 illustrates a control rule generation platform 100 according to an example embodiment of the disclosure. The platform 100 comprises computational modules that can be executed in an off-line phase to create an explicit rule-based control algorithm 118. The resulting explicit rule-based control algorithm 118 can be used for online control of a dynamical system (see FIG. 4) in a manner that may mimic an advanced control algorithm 106, such as a MPC algorithm, without having to online solve a computationally demanding optimization problem. To that end, the platform 100 may be implemented in a remote computing environment with respect to the controller of the dynamical system where it is subsequently deployed. In one embodiment, the platform 100 may be implemented in a cloud computing environment, which may allow using almost unlimited computational resources in the off-line phase. The dynamical system to be controlled may comprise, for example, an electric grid, a building system, a production or process plant, an automotive vehicle, a factory robot, or any other physical system where a system state varies with time as a function of a control action produced by a controller.

[0026] The platform 100 includes a simulator 102 to simulate the dynamical system or one or more of its components. The simulator 102 serves the purpose of simulatively generating a large number of state signals X and a control signal U for each state signal X using a simulation model 104 of the dynamical system that interacts with the control algorithm 106. Each state signal X may specify a discrete state of the dynamical system. In the shown embodiment, the state signals are read from the simulation model 104. For each state signal X, a control signal U is generated by the control algorithm 106 and assigned to the respective state signal X. The control signal U may be determined as a function of the state signal X such that, when applied, it optimizes behavior of the dynamical system, as simulated by the simulation model 104. In some embodiments, the simulation model 104 may comprise a high-fidelity physics-based model, which may be part of a digital twin of the dynamical system.

[0027] The state signals X may include, for instance, physical, chemical, design-related operating parameters, property data, performance data, environmental data, monitoring data, forecast data, analysis data and / or other data arising in the operation of the dynamical system and / or describing an operating state of the dynamical system. For example, if the dynamical system comprises a vehicle, the state signals X may include positioning data, speed, temperature, pressure, rotational speed, emissions, vibrations, fuel consumption, etc. The control signal U may be determined on the basis of solving an optimization problem by the control algorithm 106, to optimize the behavior of the dynamical system. The optimization problem may include minimizing a cost function, for example, associated with energy/power consumption, wear, distance, time, price, etc.

[0028] The state signals X can be represented as numerical data vectors mapped to a multidimensional state space. The control signals U may also be represented as numerical data vectors mapped to a multi-dimensional control action space, or may be represented as scalar values (onedimensional control action space). In one embodiment, the state signals X and the control signals U represent time series data, where, for each time step, a respective control signal U is generated by the control algorithm 106 based on an updated state signal X for that time step. The state signal X is then updated for the next time step as induced by the action resulting from the control signal U. The time series data pertaining to state signals X and the control signals U are preferably generated for a variety of initial states and operating scenarios of the dynamical system. The initial states and operating scenarios may be obtained from a database 108.

[0029] For this purpose, the simulation model 104 may simulate a behavior of the dynamical system for a variety of operating scenarios. The latter may include a variety of operating conditions and/or operating states that may occur during the operation of the dynamical system. Such operating scenarios may be extracted from operating data of the dynamical system and / or from the database 108. In one embodiment, a variety of operating scenarios and initial states of the dynamical system may be generated by a scenario generator of the simulator 102 (not shown). The scenario generator may generate, in the operation of the dynamical system, possibly occurring state signals, trajectories, time series, external influences, operating events and / or constraints to be satisfied. To vary the generated operating scenarios, the scenario generation can also be random. The generation of the operating scenarios may be based on basic data or model data for the dynamical system, which may be stored in the database 108 and fed into the scenario generator for the purpose of generation of operating scenarios.

[0030] In the described embodiment, the control algorithm 106 is an MPC algorithm. The MPC algorithm 106 typically generates, for each state signal X, a plurality of variants of a control signal. The behavior of the dynamical system is simulated for each of the variants of the control signal over a defined number of time steps, referred to as prediction horizon. Based on the simulated behavior, one of the variants can be selected and assigned as the control signal U for the given state signal X that leads to an optimized behavior of the dynamical system, possibly with specified constraint s). For example, the variant providing an optimized behavior of the dynamical system may be determined as one that results in the lowest value of a cost function subject to the constraint(s), from among the plurality of variants. The assigned control signal U is applied for a single time step, after which the above-described optimization is solved again over a receding prediction horizon, with an updated state signal X which is induced by the assigned control signal U determined at the previous time step. [0031] As an alternate example, the control algorithm 106 may comprise a neural network based policy. The policy may be trained to map state signals to assigned control signals, for example, based on reinforcement learning (RL) or any other method.

[0032] In various embodiments, as an alternative to the simulative signal generation illustrated in FIG. 1, the state signals X and the control signals U may be obtained (e.g., by actual measurement) by performing a large number of experiments involving a real-world interaction between an on-field controller executing the control algorithm and the dynamical system. This approach may be useful in a scenario involving an existing controller for which there is no access to the underlying controller code (e.g., controller code containing neural network based policies). The disclosed methodology can be used in this case to extract rules from the underlying controller code / control algorithm to make it more interpretable to concerned personnel. The extracted rules may also allow for the controller to be generalized and easily ported from one dynamical system to another (similar) dynamical system. In still other embodiments, simulative signal generation and real-word signal measurements may be combined or augmented with each other.

[0033] The state signals X and the assigned control signals U are read into a storage medium containing control data 110. The control data 110 comprises a large number of control points (X, U), each defined by a state signal X and a control signal U assigned to that state signal. The control points are represented in a control data space having a high dimensionality, depending on the dimensionality of the state space (input space) and the control action space (output space). In practice, the control points he on a lower-dimension manifold embedded in a higher-dimensional control data space. Such a manifold is referred to here as control data manifold. For a scalar output, the control data manifold may have the dimensionality of the input space.

[0034] For the purpose of a simple visual illustration, FIG. 2 shows a representative control data manifold 200 produced by MPC of a second order linear system with a constrained input. As shown, the input space is mapped to two dimensions, represented by state parameters XI and X2, while the output space is mapped to one dimension, represented by a single control parameter U. The control points he on a two-dimensional control data manifold 200 embedded in a three-dimensional control data space.

[0035] Referring again to FIG. 1, the platform 100 includes a computational module 112, referred to as patch detector, for detecting patches on the control data manifold. The term “patch” as defined herein, refers to an atomic element defined by a local approximation function that best fits a region of the control data manifold. A patch thus has the same dimensionality as the control data manifold. A local approximation function may include, for example, a linear, quadratic or higher polynomial function. A patch defined by a linear local approximation function is referred to as a hyperplane.

[0036] In the patch detector 112, patches are detected in an unsupervised manner by identifying control points on the control data manifold that belong to a common local approximation function. According to an exemplary algorithm, the patch detector 112 samples control points Q in the control data manifold and determines a set Z of k nearest control points in the neighborhood of each sampled control point Q. Next, the patch detector 112 determines an equation E of a local approximation function describing a region (patch) in the control data manifold comprising the control point Q and the set Z of k neighborhood control points (i.e., an equation fitting Q+Z). If the equation E is the same as or similar to an equation of a stored local approximation function of another patch, then the local approximation function associated with Q+Z is assigned an existing patch label; if not, then the local approximation function associated with Q+Z is assigned a new patch label. In an example embodiment, “similarity” of the equation E with any of the stored local approximation function may be determined, by comparing a fitting error of equation E to a fitting error of the respective stored local approximation function. If there exists a stored local approximation function for which the fitting error is close to the fitting error of the equation E (e.g., within a defined threshold), then the patch label of that stored local approximation function as assigned to the equation E; else a new patch label is created. After sampling the control points in the control data manifold, a set of n patch labels are detected, and each patch may be represented by a unique local approximation function.

[0037] In the case of a linear system, such as that in the example shown in FIG. 2, the patches detected on the control data manifold comprise hyperplanes. In this case, the equations E used in the above-described patch detector algorithm comprise linear equations, whose coefficients can be determined to fit a hyperplane on a neighborhood of control points associated with each sampled control point.

[0038] FIG. 3 illustrates a visualization of hyperplanes approximated on the control data manifold 200 shown in FIG. 2. In this case, the patch detector algorithm has detected, in an unsupervised process, hyperplanes Hy, Hy, and Hy, by grouping control points that belong to a common local approximation function, for example, as explained above. The hyperplane H/ is indicative of a linear constrained control, while the hyperplanes Fh, and H ? are indicative of saturation effects of the control algorithm 106.

[0039] In the shown visualization, since the control data manifold 200 is two-dimensional, the linear patches H/, H2, and H ? also belong to respective two-dimensional planes. To generalize, for any higher order linear system, the control data manifold is of P dimensions (where P > 2) and may be formed by a plurality of P-dimensional hyperplanes embedded in a higher-dimensional (>P) control data space. The example shown in FIGS. 2 and 3 can thus be said to have two-dimensional hyperplanes embedded in a three-dimensional control data space.

[0040] The patch detector algorithm is thereby able identify regions on the control surface without prior knowledge of the controller behavior. Thus, in addition to approximating the control rules, the above-described patch detector algorithm can be extended to identify different functional elements comprising the controller - such as saturation elements, LQR controller surfaces, PID controller surfaces - without any prior knowledge of the controller.

[0041] For some controllers, the control points may define one or more regions containing sharp edges, which may potentially lead to an error of mis-identifying control points along these edges into an appropriate hyperplane. In this case, it is still reasonable to assume that the regions containing sharp edges are smaller in measure than the patches / hyperplanes. A first solution to the above technical problem involves first classifying the control points into hyperplanes (using a control point classifier 114 as described below) and subsequently mapping mis-identified sharp edges into one of the hyperplanes. A second possible solution may involve identifying abrupt changes in local fitting and refining neighborhood points. Sharp edges in control data are particularly typical in case of reinforcement learning (RL) - based controllers. For such controllers, the RL training may be modified to include smaller steps or to move away from sharp edges.

[0042] In some linear controls, the hyperplanes may be distinguished by a different set of constraints being active. In such examples, if there is an a-priori knowledge about constraints, the hyperplanes may be assigned classification labels by evaluating a constraint function. However, the disclosed methodology does not necessarily rely on a-priori knowledge of constraints. [0043] For complex non-linear systems, the control data manifold may comprise one or more nonlinear regions. In such a case, the patch detector 112 may be configured according to one of the below- mentioned approaches. In a first approach, the patch detector 112 may fit a plurality of hyperplanes to approximate a non-linear region on the manifold. In one embodiment, the patch detector 112 may work with a specified maximum number of hyperplanes that the control rules are be approximated with. In a second approach, the patch detector 112 may fit a non-linear region with a single non-linear patch or multiple non-linear patches, each defined by a quadratic or a higher order polynomial local approximation function. Such a technique could be employed to reduce or optimize the number of patches used to approximate the non-linear region of the control data manifold and/or to reduce an error between the control points on the manifold and the local approximating functions. The patch detector 112 need not rely on a-priori knowledge about linearity of the controller and may appropriately determine a local approximation function based on determining a rate of change of each hyperplane and correlating the rate of change within a region to fit either a linear or a polynomial local approximation function.

[0044] Still referring to FIG. 1, the platform 100 includes a computational module 114 for training a control point classifier for classifying control points into different patches from among the detected patches. The control point classifier can be trained in a supervised process using patch labels created by the patch detector 112. In one embodiment, the control point classifier may comprise support vector machines (SVMs) to classify the control points into different patches. SVMs are particularly suitable in the present application as being simple and easy to understand and interpret. Another suitable embodiment of a classifier providing easy interpretability may include decision trees. In various embodiments, other mathematical and/or machine learned models may be used to for classifying the control points into patches.

[0045] For multiclass classification using SVM, a binary classification principle can be utilized after breaking down the multiclassification problem into multiple binary classification problems. In the disclosed embodiment, a number of SVMs are trained that equals the number (n) of patches detected in the control data manifold. Here, each SVM is trained to perform a binary classification between a given patch and the other patches (one-versus-all). After training, each patch is assigned an SVM indicator function, which may be evaluated to determine whether a control point belongs to that patch or not. For a hyperplane classification problem, the SVM indicator functions assume the form of linear equations (see FIG. 5). A total of n SVM indicator functions F/, F?, ... F„ are thus generated. The SVM indicator functions can be evaluated to classify a measured state signal into an active patch during online operation.

[0046] Continuing with reference to FIG. 1, the platform 100 includes a computational module 116 for training a respective regression model (regressor) for each detected patch for approximating a relationship between the state signals and the control signals in that patch. A total of n regressors Ry, Ry . R, are trained, each regressor corresponding to a respective patch. In embodiments, depending on the patch, the associated regressor may comprise a linear regression model or a polynomial regression model. In the example illustrated herein (see FIG. 3), each hyperplane Hy, FE, H? may be associated with a respective trained linear regression model Ry, Ry, Ry. Each regression model in this example specifies an explicit linear relationship between the input variables (state parameters XI, X2) and the output variable (control parameter U) for the respective hyperplane, that may be easily solved online and readily interpreted. In some embodiments, one or more of the regressors may be trained by assembling discrete rule elements (e.g., mathematical operators such as ‘+’, EXP, ... ) and performing symbolic regression to determine an expression that best fits the patch.

[0047] The trained control point classifier (in this case, defined by SVM indicator functions Fy, Fy, ... F„) and the regression models Ry, Ry . R,, are used to create the explicit rule-based control algorithm 118. The explicit rule-based control algorithm 118 may be embodied in a computer program, which can be be transferred to a memory of a controller for controlling the dynamical system. In embodiments, the transfer may take place electronically, from a remote location, such as via the Internet, or by way of a physical memory, such as a flash drive. Alternately, the set up of the explicit rule-based control algorithm 118 can be implemented directly on the hardware of the controller for controlling the dynamical system.

[0048] FIG. 4 illustrates a controller 404 for controlling a dynamical system 402 by executing an explicit rule-based control algorithm 118 created according to one of the disclosed embodiments. The controller 404 may comprise a PLC, an industrial PC, an edge device or other computing device including one or more processors 408 and a memory 410 storing a computer program incorporating the explicit rule-based control algorithm 118. The controller 404 may be implemented as part of the dynamical system 402 (such as a controller embedded in a vehicle or robot) or may be implemented in part or wholly external to the dynamical system 402. In the drawing, the controller 404 is shown external to the dynamical system 402 for reasons of clarity.

[0049] The dynamical system 402 includes at least one sensor 406, which continuously measures one or more operating states of the dynamical system 402 and outputs them in the form of a measured state signal X a . The measured state signals X a can each be represented as a numerical data vector mapped to a multi-dimensional state space. In embodiments, the measured state signals X a are coded over time, representing time series data. The measured state signals X a are transmitted to the controller 404. Based on the measured state signal X a at each time step, the controller 404 determines a control action U a to optimize a behavior of the dynamical system 402 by executing the explicit rule-based control algorithm 118. The control action U a is determined by identifying an active patch as a function of the measured state signal X a and evaluating the respective regression model for the identified active patch.

[0050] Continuing with the described embodiment of a linear system, FIG. 5 illustrates an exemplary method 500 for determining a control action U a by the explicit rule- based control algorithm 118 based on a measured state signal X a . As stated above, in the described embodiment, n SVMs are generated that are each associated with a respective hyperplane, for which the respective SVM indicator functions Fy, Fy, ... F„ are codified in the explicit rule-based control algorithm 118.

[0051] Referring to FIG. 5, at each time step, the method 500 involves evaluating SVM indicator functions Fy, Fy, ... F„ for individual hyperplanes using the measured state signal X a as displayed in blocks 502a, 502b..502n. An active hyperplane is detected when a respective SVM indicator function F z yields a value 1. All other SVM indicator functions that yield a value 0 indicate inactive hyperplanes. Having determined the active hyperplane, next, at block 504, the respective regressor R z for the active hyperplane is evaluated using the same measured state signal X a , to yield the control action U a .

[0052] The described approach of evaluating SVM indicator functions to identify an active patch and evaluating a regressor associated with the active patch to determine a control action may be likewise implemented for non-linear systems by fitting non-linear regions by multiple hyperplanes or using polynomial local approximation functions and regressors as described above.

[0053] Referring again to FIG. 4, using the explicit rule-based control algorithm 118, the controller 404 converts the state signal X a into a control action U a in a manner that mimics the control algorithm 106 described in FIG. 1. The control action U a induces a change in system state of the dynamical system 402, which is reflected in the measured state signal X a transmitted to the controller 404 in the next time step, for which a subsequent control action U a may be determined as described above. A temporal control of the dynamical system 402 by the controller 404 is thereby established to optimize, at each time step, a future behavior of the dynamical system 402. Since the algorithm 118 is simple and explicitly rule-based (using a combination of indicator functions and regressors), it is not computationally intensive, and the controller 404 can evaluate control actions based on highdimensional system states without having to solve complex optimization problems on the fly.

[0054] An illustrative use case is now described where the disclosed embodiments may be utilized for providing an optimal economic dispatch of electricity from a grid. Although the illustrative use case is simple, the underlying principle can be applied to complex systems with high-dimensional system states and a large number of constraints.

[0055] The dynamical system in the described use case includes a battery that is chargeable by a photovoltaic (PV) panel and dischargeable to provide power to a building, which also receives power from the grid. A controller is tasked with controlling the charging or discharging of the battery to minimize the price of transacting power from the grid. The temporally varying operational states of the dynamical system comprise: (i) building power consumption or load Lt, which is depicted in FIG. 6A (based on defined usage patterns), (ii) power produced by the PV panel X, which is depicted in FIG. 6B (typically matching daylight energy), and (iii) price of electricity pt (per unit of power), which is depicted in FIG. 6C.

[0056] The optimization problem in this use case may be formulated as minimization of a linear cost function given by: minimize (eq. 1) subject to the constraints: < 100; | < d; where m is the prediction horizon of the controller, B, is the state of charge of the battery, and d is the maximum amount of charging or discharging allowed in a single time step. [0057] An MPC algorithm can be used to solve the above optimization (eq. 1) at every time step and apply the first control step before re-solving with a receding horizon. The control action generated by the MPC for m=2 is represented in FIG. 7, where, at each time step, the MPC algorithm determines an action which may be either of (i): charging the battery (i.e., building uses power from the grid); or (ii) discharging the battery (i.e., building uses own resources without power from the grid); or (iii) not changing the state of charge (i.e., building uses power from the grid); with the objective of minimizing the price of electricity transacted from the grid.

[0058] For a prediction horizon m 2 it is recognized that the control action U at each time step may be solely determined based on the following state parameters: XI - change in electricity price at that time step; and X2 - battery state of charge. With this, the present use case reduces to an MPC problem for a second order linear system with constrained input, similar to the representative example shown in FIGS. 2 and 3.

[0059] FIG. 8 shows a control data manifold 800 formed by sampling a large number of state signals and control signals for the present use case, for example, by simulatively generating these signals (as described in FIG. 1) and/or by performing on-field experiments. From the control data manifold, the above-described methodology may be used to detect hyperplanes 802, 804, 806, classify control points to hyperplanes and train linear regression models for each detected hyperplane, to create an explicit rule-based algorithm which may be deployed in the controller in the present use-case to mimic the MPC solution.

[0060] The above-described use case is merely illustrative, and several other applications of the disclosed methodology exist. As non-limiting examples, the disclosed methodology may be used in a building controller (e.g., to adapt room heating/cooling to weather and use), in an automotive controller (e.g., to provide energy optimal predictive power control for given speed corridor), in a process controller (e.g., to control highly dynamic reactions), in a factory robot controller (e.g., to provide energy and wear optimized robot paths), among other applications.

[0061] The embodiments of the present disclosure may be implemented with any combination of hardware and software. In addition, the embodiments of the present disclosure may be included in an article of manufacture (e.g., one or more computer program products) having, for example, a non- transitory computer-readable storage medium. The computer readable storage medium has embodied therein, for instance, computer readable program instructions for providing and facilitating the mechanisms of the embodiments of the present disclosure. The article of manufacture can be included as part of a computer system or sold separately.

[0062] The computer readable storage medium can include a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.

[0063] The system and processes of the figures are not exclusive. Other systems, processes and menus may be derived in accordance with the principles of the disclosure to accomplish the same objectives. Although this disclosure has been described with reference to particular embodiments, it is to be understood that the embodiments and variations shown and described herein are for illustration purposes only. Modifications to the current design may be implemented by those skilled in the art, without departing from the scope of the disclosure.