Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM FOR BALANCING ENERGY SOURCE EXPLORATION AND OBSERVATION TIME OF AUTONOMOUS SENSING VEHICLES
Document Type and Number:
WIPO Patent Application WO/2022/133605
Kind Code:
A1
Abstract:
A multi-objective method of optimizing the time the ASV spends 'in observation' or 'sensing' and the time the ASV spends 'recharging' or 'energy harvesting' is taught herein. The method comprises: collecting data on observation points of interest, determining whether or not energy harvesting is needed, and effectively visiting the observation points of interest between the search for energy harvesting.

Inventors:
RAHBARNIA FARHAD (CA)
BOROWCZYK ALEXANDRE (CA)
LANGELAAN JACOB WILLEM (US)
Application Number:
PCT/CA2021/051871
Publication Date:
June 30, 2022
Filing Date:
December 22, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TANDEMLAUNCH INC (CA)
International Classes:
G05D1/02; B60W60/00; B63G8/14; B64C19/00; B64C39/02
Foreign References:
US20170069214A12017-03-09
US20150127223A12015-05-07
US20200290742A12020-09-17
Attorney, Agent or Firm:
HUNTER, Christopher N. et al. (CA)
Download PDF:
Claims:
Claims:

1 . A system of controlling an autonomous sensing vehicle comprising:

- an on-board system that is local to the autonomous sensing vehicle;

- an off-board system;

- at least one sensor located on the autonomous sensing vehicle;

- an autopilot command system; wherein the local on-board system obtains information from the off-board system, sensors, and autopilot command; and wherein the off-board system comprises a decision-making algorithm to choose to either visit a point of observation or to visit a point of energy harvesting based on the information received.

2. The system of claim 1 , wherein the autonomous sensing vehicle is an aerial vehicle and the point of energy harvesting is a thermal.

3. The system of claim 1 , wherein the autonomous sensing vehicle is an underwater vehicle and the point of energy harvesting is a wave current.

4. A method of controlling an autonomous sensing vehicle comprising:

- generating a reward map corresponding to at least one point of observation;

- generating a value function map;

- generating a probability map corresponding to at least one point of energy harvesting;

- generating a combination map by combining the value function map to the probability map;

- making a decision to visit the point of observation or to visit the point of energy harvesting based on the generated combination map; and

- sending instructions to an on-board system on the autonomous sensing vehicle to either visit the point of observation or to visit the point of energy harvesting based on the decision.

5. The method of claim 4, wherein the step of generating the value function map comprises:

- breaking a region of interest to a grid;

- adding a starting point; and

- assigning a positive value to the points of observation

6. The method of claim 5, wherein the method further comprises the step of: - assigning a negative value to regions of restriction.

Description:
SYSTEM FOR BALANCING ENERGY SOURCE EXPLORATION AND OBSERVATION TIME OF AUTONOMOUS SENSING VEHICLES

TECHNICAL FIELD

[0001] The following relates to autonomous sensing vehicles and more specifically relates to energy harvesting methods of such autonomous sensing vehicles.

BACKGROUND

[0002] Autonomous sensing vehicles (ASV) such as drones, unmanned aerial vehicles, controlled balloons (i.e. hot air balloons), remotely operated vehicles and remotely operated underwater vehicles are vehicles that are typically unoccupied, usually highly maneuverable, and can be operated remotely by a user proximate to the vehicle or can be operated autonomously. Autonomously operated vehicles do not require a user to operate them. Autonomous vehicles may have the potential to greatly improve both the range and endurance of unmanned vehicles. Autonomous sensing vehicles may be used for a number of uses including, but not limited to remote sensing, commercial surveillance, filmmaking, disaster relief, geological exploration, agriculture, rescue operations, and the like. It can be noted that it would be ideal to increase the operation time and endurance of ASV’s for these and other uses. Autonomous sensing vehicles may contain a plethora of sensors which can include, but are not limited to accelerometers, altimeters, barometers, gyroscope, thermal cameras, cameras, LiDAR (Light Detection and Ranging) sensors, etc. These sensors may be useful for increasing the operation time and endurance of ASV or may be useful for the uses mentioned above. For instance, a gyroscope can be used for measuring or maintaining orientation and angular velocity of the ASV and may improve the operational time of the ASV; however, a camera may be used to take images during geological exploration.

[0003] One of the key constraints on the performance of the ASV can be energy.

Energy can have a direct effect on the ASV’s endurance, range, and its payload capacity. To manage the energy levels better, an ASV may extract energy from its environment, this is referred to as ‘energy harvesting’ herein. The ASV can use any method of energy harvesting, or a combination of energy harvesting methods to harvest energy to increase endurance and range of the ASV. In one example, underwater ASV’s may harvest energy using wave currents. In another example, land ASV’s may harvest energy level using solar power. In yet another example, aerial ASV’s may harvest energy level using of thermal updrafts and ridge lifts (referred to as ‘soaring’ herein).

[0004] Soaring takes advantage of a thermals to increase the flight time of an aerial ASV and has been a studied and experimented in the past two decades. For example, in 2010, Edwards & Silverberg demonstrated soaring against human piloted competitors in the Montague Cross Country Challenge for remote-controlled sailplanes. However, there may be challenges related to soaring.

[0005] Some challenges include: sensing (an effective soaring system should be able to sense the surrounding environment and the motion of atmospheric currents); energy harvesting (the aerial ASV should be equipped to make decisions to exploit energy and avoid sinking in air), energy level considerations (i.e. the aerial ASV should be able to consider its energy state and that of the environment as it navigates).

[0006] AutoSoar (Depenbusch, Nathan T., John J. Bird, and Jack HZ Langelaan. "The AutoSO AR autonomous soaring aircraft part 2: Hardware implementation and flight results. " Journal of Field Robotics 35.4 (2018): 435-458) addresses some of these issues. AutoSoar teaches a method of autonomous soaring using of thermal updrafts and ridge lifts. AutoSoar aims to address all the phases of thermal soaring such as: thermal detection, thermal latching and unlatching, thermal centering control, mapping, exploration, and flight management. AutoSoar aims to teach a method of increasing the flight time by using thermals and ease the search to find these thermals.

[0007] However, AutoSoar does not optimize the operational time of an ASV using energy harvesting while simultaneously achieving the ‘sensing’ or ‘observational’ goals of the ASV mission. There remains a need for a method which optimizes/balances the time the ASV spends ‘in observation’ or ‘sensing’ and the time the ASV spends ‘recharging’ or ‘energy harvesting’.

SUMMARY

[0008] A multi-objective method of optimizing the time the ASV spends ‘in observation’ or ‘sensing’ and the time the ASV spends ‘recharging’ or ‘energy harvesting’ is taught herein. The method comprises: collecting data on observation points of interest, determining whether or not energy harvesting is needed, and effectively visiting the observation points of interest between the search for energy harvesting.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] Embodiments will now be described with reference to the appended drawings wherein:

[0010] FIG. 1 is a schematic diagram of the off-board algorithm;

[0011] FIG. 2 is a schematic diagram of the on-board path planning system;

[0012] FIG. 3 is a schematic diagram of the time based algorithm;

[0013] FIG. 4 is a schematic diagram of a reward map; [0014] FIG. 5 is a schematic diagram of a value function map;

[0015] FIG. 6 is a schematic diagram of a probability map of the previously-discovered thermals in a given region;

[0016] FIG. 7 is a schematic diagram of an embodiment of a combined map;

[0017] FIG. 8 is a schematic diagram of the decision-making AutoSoar system adjusted to include the greedy decision-making algorithm;

[0018] FIG. 9 is a is a schematic diagram of the on board system for the Smart Decision Making algorithm; and

[0019] FIG. 10 is a is a schematic diagram of the on-board system for the decision making algorithm having a reinforcement learning system.

DETAILED DESCRIPTION

[0020] A multi-objective method of optimizing the time the ASV spends ‘in observation’ or ‘sensing’ and the time the ASV spends ‘recharging’ or ‘energy harvesting’ is taught herein. The method comprises: collecting data on observation points of interest, determining whether or not energy harvesting is needed, and effectively visiting the observation points of interest between the search for energy harvesting.

[0021] The method taught herein can increase the endurance of a ASV while effectively visiting the observation points of interest. The determination of the balance between energy harvesting, exploration for energy harvesting, and visit the observation points is taught herein. It can be noted that by using different input signals, the ASV is directed to expand its energy levels and operational times while following the observation targets.

[0022] An optimized system of ASV observation is taught herein. The system is comprised of an off-board computer software; and local on-board smart system. The off- board computer software program takes the past flight data, weather forecast, mission objectives, and ASV’s characteristics. This program then uses this information, to generate a potential map of paths and potential paths. These maps and paths are planned with weather forecast aware system but do not need them to generate the maps.

[0023] The local on-board smart system takes the information from the off-board computer, signals from sensors, and autopilot command. It also may (or not) have access to a localized weather system (Third party). This system based can choose the next way point based on the information presented. It uses a Smart Decision Making System to balance between exploration and exploitation of the environment. This system will update the maps of energy sources. For example, this system may choose bank angel or speed of an aircraft to make it behave in a more optimized fashion. In an embodiment, the suggested solution allows an aerial ASV to take advantage of thermals while behaving as expected in the observation missions. In another embodiment, the suggested solution allows an underwater ASV to take advantage of wave currents while behaving as expected in the observation missions.

[0024] The local on-board smart system takes the information from the off-board computer, signals from sensors, and autopilot command. It also may (or may not) have access to a localized and global weather system from a third party. This system can choose the next way point based on the information presented. It uses a Smart Decision Making System to balance between exploration and exploitation of the environment. This system will update the maps of energy sources. This system may choose and/or modify the bank angel or speed of an aircraft to make it behave in a more optimized fashion. In an embodiment, the suggested solution allows an aerial ASV to take advantage of thermals while behaving as expected in the observation missions. In another embodiment, the suggested solution allows an underwater ASV to take advantage of wave currents while behaving as expected in the observation missions.

[0025] This method enables the endurance of the ASV to increase while the observation goals have been met. This method enables any ASV to effectively carry on their mission and take advantage of the free energy sources available in the atmosphere, (i.e. thermal updrafts, tidal energy, solar energy).

Off-Board Path Planning and Map Generation

[0026] FIG 1 shows a schematic diagram of the off-board algorithm. The off-board algorithm can calculate and generate desirable routes for the on-board computing agent. The off-board algorithm can create a poll of potential actions for the on-board computer to decide from. The off-board computer can take some inputs 111 and generate an output 112. Some outputs 112 include, but are not limited to: value function map 109 or a value list of many paths 110. Some inputs 111 include, but are not limited to: start point 101 , end point 101 , region(s) of interest 101 , no-fly zones, boundaries, past flight data 102, aircraft parameters 103, energy capabilities of the ASV 104, maps (terrain, land cover, underwater, etc) 105, meteorological forecast 106; the importance factor for observation 107 and type(s) of energy harvesting required 100. The off-board algorithm, via the off board path planner 108, can take these inputs 111 and generate an output 112. In a preferred embodiment, the output 112 comprises the potential value function map 109 and/or the list of paths having a value associated with them 110. [0027] Thus, in one embodiment, using dynamic programming and information available such as the location of observation points, past flight information 102, wind and weather forecast 106, and vehicle energy states 104, the off-board algorithm 108 generates a value function grid 109 of with values function associated with each grid. The system can use this as an input to the on-board computer system that manages the vehicle behavior during operation and determines when changing behavior is appropriate.

On-Board Path Planning System

[0028] FIG. 2 shows a diagram of the on-board path planning system. Once the off- board global path planner 108 is computed, the on-board controller can use the output of the off-board system for its decision making. The on-board path planner 113 can: use the value function map 109 to generate potential behavior for the ASV considering the observation points (objectives) and fastest path to them; account for the sensor readings 114 (i.e. wind direction, energy levels, air restrictions, etc.); decision and balancing between exploration and exploitation of thermals and observation point exploring; optimize behavior near thermals and observation points (for instance: a thermal sequence or circling sequence around thermal uplift and observation point); generate a number of local way points, bank angle suggested speed of the ASV, etc.

[0029] Some inputs 111 of the on-board path planner 113 include, but are not limited to: sensor reading 114, energy capabilities of the ASV 104, energy reading 104b, autopilot commands 115, meteorological forecast 106, maps (terrain, land cover, underwater, etc) 105, the output 112 from the off-board planner 108, potential value function map 109, the list of paths having a value associated with them 110, waypoints 116a, type(s) of energy harvesting required 100, and the importance factor for observation 107. The inputs 111 to the on-board path planner 113 may also include end point 101 , region(s) of interest 101 , no- fly zones, boundaries, past flight data 102, and aircraft parameters 103. In a preferred embodiment, the output of the on-board (local) path planner 113 comprises a map with potential probabilities 117 and/or the list of new waypoints 116.

A variety of methods can be used in the local path planner 113. The methods include: the time-based algorithm; Greedy algorithm decision making; and Smart Decision Making System.

Time Based Algorithm

[0030] The time-based system can be used to balance the time spent on exploration for energy sources versus going on the mission. After defined time in energy source exploration mode, the system can directly begin observing the nearest mission. For example, an ASV using a time-based system could be on an observation mission for a specified amount of time. If after that time, the ASV is still in observation mode, the system can switch to energysource exploration for another specified amount of time. The system can switch between exploration mode and observation mode multiple times. FIG. 3 shows the time based algorithm.

[0031] The ASV system can have access to a list of observation points. At a certain time, after completing exploration mode by climbing 118 and decision making 119, the ASV will look for a first observation point, (i.e. the closest observation point 120. The ASV can then decide to go to observe 124 the first observation point 123 and update the observation list 125; or, if the ASV has never explore the area for thermals 126 and use them to energy harvest 130 and update the list of thermals 131 as needed. In one embodiment, the balancing decision comes from a timer on board. If the timer times out during observation mode, the system will cause the ASV to switch to exploration mode 126 for a specified time. In one embodiment, the system can repeat this action till the ASV arrives at an observation point. Once the observation point is observed, the system will move that observation point to the end of the list of observation points and set the next observation point as the next goal. This sequence may be repeated. In another embodiment, if the timer times out during exploration mode, the system will cause the ASV to switch to observation mode for a specified time.

Value Function Map

[0032] Since the location of the first observation point can be known, a grid map, or, value function map can be created that covers a whole region of interest. FIG. 4 shows an example of a potential value function map 109 created by the on-board or off-board path planner 108, 113. A user can input latitude, longitude and/or altitude information for an observation target. The system will then assign values to each target. In one embodiment, the reward map is created by assigning large positive rewards 403 to the point of interest, negative reward 401 to a “no fly” region (such as a region of bad weather). FIG. 4 shows an example of two points of observation 403 located at [6,9], and [10,10]; a starting point is located at [0,0], Each point of observation is given a large positive value (5). In one embodiment, the starting point 401 is given a negative value of (-1). The remainder of the regions 402 are given a value of 0.

[0033] FIG. 5 shows another example of a value function map. The value function map is made using the reward map shown in FIG. 4 and a value function equation defined below. The system can calculate relative values corresponding to each cell/location. The method of generating the value function map can comprise: breaking the region of interest to a grid, adding a starting point, defining the potential observation points as a positive value cell. The method can further comprise the step of defining a grid state, i.e. define each cell as a new state for the environment. It is noted that the value 404 of each cell is correlated with how ‘good’ it is to be in that cell. One approach is to define a variable to tell us how good each cell is by assigning a reward to each cell. In one embodiment, this map is created by assigning large positive rewards to the point of interest, negative reward to a “no fly” region (such as a region of bad weather); and small positive reward to the past locations of thermal recurrences. A reward is added for going over the observation point. In FIG. 5, the highest value 403 is given to cell [11 , 10], Cells [12,10]; [8,7], etc are assigned a medium level of reward 408, 405. Cells [13, 10]; [9, 6] are assigned a low level of reward 406, 407. Cells [4, 10]; [6, 0] are assigned a neglible level of reward 401 , 402.

[0034] The notion of "how good" 404 here is defined in terms of future rewards that can be expected, or in terms of expected return. Accordingly, value functions are defined with respect to policies. A policy is a mapping from each state, and action, to the probability of taking action when in the state.

[0035] The method can further comprise defining a set of possible actions. A special action set is defined by 8 moves possible by the ASV, with all action having the same probability to be chosen. The ASV can move 8 directions from any cell to its neighboring cells. It can be noted that edge cases are limited version of the actions (i.e. can only move 3 directions from [0,0]).

[0036] In order to define a value function equation, one can define a state s e S, where s can be a point in a grid size m X n, which represents a geological location, s can store values of weather, probability of energy, presence or absence of observation point. Let us define rewards as: {/3 for states which have observations 0 states that have energy gain potentials 0 otherwise

[0037] where /3 and 6 are real positive numbers. We then define action a t e A at grid t.

[0038] Note W.P symbolizes “with probability of’. We can then define policy Fl(s, a) that assigns a value to probability of each actions at each states. For simplicity, we assume it is a uniform distribution policy from here on, but it may be anything or even be learnt.

[0039] Let us define G f the expected reward at location t

[0040] here 0 < y < 1 is the discount factor for future rewards

[0041] where r t , rt+1,... are generated by following policy TT starting at state s

[0042] The Value function of each grid points can then be:

[0043] where r t , rt+1,... are generated by following policy TT starting at state s

[0044] We use the Value function to generate the maps of value function grids.

[0045] The above-noted steps can also be applied for multiple sources of input, (i.e. with past flight information or wind). In one embodiment, one of the multiple sources of input may include past flight information.

[0046] FIG. 6 shows a probability map of the previously-discovered thermals in a given region. The closer the cell value is to one, the more likely that a thermal was previously encountered at that location. The probability map can show the likelihood of finding an energy source from past flight/weather information. For example, there is a 99% likelihood of finding energy source at [5,1] 602; a 97% likelihood of finding energy source at [7,2] 603; and a 38% likelihood of finding energy source on the black squares 601 labelled 0.38. [0047] FIG. 7 shows an embodiment of combined map. The combined map can combine the value function map with the probability map. The combined map can be a dynamic map, that changes based on user inputs of alpha. The greedy decision making algorithm below explains the user-defined alpha value in more detail. The values of each cell are weighted differently based on a user-defined alpha value. For example, if the user-defined alpha value is closer to observation mode, the observation points will have higher relative weighting. This will allow the system to more likely default to observation mode. In another instance, if the user defines the alpha value as closer to exploration for energy harvesting, the energy sources will have higher relative weighting. This will allow the system to more likely default to exploration mode.

[0048] It is important to note that there are many ways of combining the value function map and the probability map information, such as adding a high reward value to the regions of high probability of thermals and low value otherwise. In one embodiment, an importance multiplier can be introduced that balance the rewards associated with observation points and thermal updraft points. The importance multiplier value can be tuned based on different mission where sometimes the exploration of thermal is more important the observing the observation point and vice versa.

[0049] The observation points may also be moving. The algorithm can follow any observation point that can be fixed or moving. Moving targets may require an online connection to refresh the map.

Greedy Decision Making Algorithm

[0050] The greedy decision making algorithm can balance between the exploration for energy sources and observation mode by a greedy probability.

[0051] Once a value function map is defined, the map for the optimal behavior can be defined as follows. Define steps or, actions that the ASV takes to travel 1 or n number of cells. The value function map shows the optimal behavior as the action that can be taken by the ASV from each given cell to the highest value neighboring cell.

[0052] The greedy decision making algorithm can then balancing between observation mode and exploration mode. The algorithm can utilize various maps to decide the behavior of the ASV such as exploration and observation modes. In one embodiment, the system will choose to visit the highest value neighboring cell as this is the cell that defines a path for optimal behavior. In another embodiment, to achieve more accurate decision making and behavior, a biasing map may be used in combination with the value function map. The biasing map can choose the best valued cell given matching the biasing direction. The algorithm can narrow down the potential cells to choose from by utilizing the biasing map. [0053] The greedy decision making algorithm can then start exploration mode to find new energy sources. In one embodiment, the algorithm can include the biasing map in combination with the exploration map and bias the map toward the observation point.

[0054] In another embodiment, the algorithm can switch between exploration mode and observation mode with a greedy or a stochastic function. An a value can be defined. The alpha value can represent the balance between exploration and going to observation point. The alpha value changes between O to 1. In one embodiment, the alpha value can be defined such that the closer it is to zero, it favors exploration mode; and the closer it is to 1 , it favors observation mode. As time goes on and more exploration is conducted, the alphavalue increases toward one. This enforces that the observation point is met. Once the ASV arrives at the observation point, the alpha value goes close to 0 (such as 0.0001). This allows the ASV to go back to exploration. Over time, the ASV comes back to exploration mode as the speed of the alpha value is decaying. The increasing of the alpha value depends on a hyper parameter. The hyper parameter can be chosen by the user. It can range between 1% and 99%. The preferable range is approximately 5 - 15%. For instance, a 10% hyper parameter updates the map each time an observation point is visited. The reward for that observation point is lowered close to 0 so that it favors another observation point over the current observation point. This action may be repeated till all the observation points have been observed. In another instance, once the next observation point is achieved, the value of reward can be restored for the earlier observation point.

[0055] The greedy decision-making algorithm is beneficial as many observation points may be defined. It can be generalized with different observation points having different importance levels. It can include priority and wind map information to generate the value function easily and make the map smarter. Furthermore, can be adopted to be run on the ASV and provide live updates.

[0056] The following step function may be used:

. > (Explore , w. p (1 — a) J - J ( Go — to_obsr, w.p a

[0057] FIG. 8 provides a state diagram of the decision-making 819 AutoSoar system adjusted to include the greedy decision-making algorithm. The “go to observe” tab 824 is just directed move to observation mode 825. The amount of time spent in exploration mode 826 may be adjusted or chosen by the user. The amount of time spent in observation mode 825 may be adjusted or chosen by the user.

1 . With probability of a we go to the go-to-observe state 824. 2. Once we finish with observation points 825, we go from the observation state to decisions state 819 and set the a = 0. 001.

3. With probability of 1 - a we go to the Explore state 826.

4. One side explore state we do 1 Exploration As defined by AutoSoar exploration methods. We increase a = a + 0. 1 as we go back to the decision state 819.

5. If we are in a close approximate the observation point, we will go to that observation 825 as tagged in the diagram by 5 and 6

6. We only move 1 cell and then go back to the decision-making state 819.

7. If we encounter the thermal, then latch on it 829.

8. Once we finish latching 829 on and we go back to the decision-making 819.

Smart Decision Making System

[0058] The Smart Decision Making System is a combination of the earlier methods. In this algorithm, the actions of the system are biased by set of rules defined before the flight. A bias algorithm can be used to choose the best action that maximize the chances of thermal and maximize the observation behavior. The actions of the system may be biased by a set of rules defined before the flight.

[0059] FIG. 9 is a is a schematic diagram of the on board system for the Smart Decision Making algorithm 119. The algorithm combines the time algorithm, the greedy algorithm, and sensor reading to explore the environment for more desirable energy sources. The actions are optimized based such that it maximizes the behavior the system towards the observation points and past known energy sources. It uses the biases defined in the earlier optimization to maximize the exploitation in a safe manner and efficiently.

[0060] The algorithm can evaluate the readings of the sensors 114 and evaluate its value function map. In one embodiment, the algorithm can be configured to trigger a new global path planner sequence if it believes the original maps are not accurate enough.

[0061] Smart Al decision making system is an on-line decision making. It can be placed on board or off board. The algorithm uses the input signals to decide on next way points, bank angle and the speed of ASV. The Al system 132 first checks 133 the readings from inputs, if they are any uncertainty or the readings are different from its value function, it will recalculate 134 and update its value function of environment.

[0062] If the readings are in the acceptable range of the value function of the system, the system will generate an observation map (such as a value function map), uncertainty map, energy, wind, and glide map. The Al system 132 then uses the alpha factor that is defined before the flight to combine these maps. [0063] The observation map can also be modified by a time factor. Time Factor is a value between 0 to 1 . It modifies the rewards of observation points before updating value function map. If an observation point is observed the reward of it goes down.

[0064] Since we combine the maps, the generated map 135 is biased towards energy sources, observation points, wind directions, and heading of the ASV.

[0065] The Smart Al decision making system 132 then calculates the trajectory and direction for next point to travel to, generates a waypoint 116, bank angle, and speed.

Reinforcement learning (RL) agent Decision Making System

[0066] The RL system 136 is similar to the Smart Decision-Making System. FIG. 10 is a is a schematic diagram of the on-board system for the decision making algorithm having a reinforcement learning system 136. In this embodiment, the system uses the information to decide to either visit the point of observation or to visit the point of energy harvesting based on the decision.

[0067] The RL system 136 can be trained or engineered to make the decision. One method of the training is to let the RL agent to be trained in the simulation environment. The evaluation can be by human feedback or compare the results with other systems results.

[0068] A reward function can be defined to also train the RL agent that evaluate how much energy was used, whether observation points were visited, and time spent on them. The RL system can use a deep Neural Network as well. The RL system makes the decision based on the input signals and the processed data, the next few points.

[0069] For simplicity and clarity of illustration, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the examples described herein. However, it will be understood by those of ordinary skill in the art that the examples described herein may be practiced without these specific details. In other instances, well-known methods, procedures and components have not been described in detail so as not to obscure the examples described herein. Also, the description is not to be considered as limiting the scope of the examples described herein.

[0070] It will be appreciated that the examples and corresponding diagrams used herein are for illustrative purposes only. Different configurations and terminology can be used without departing from the principles expressed herein. For instance, components and modules can be added, deleted, modified, or arranged with differing connections without departing from these principles. [0071] It will also be appreciated that any module or component exemplified herein that executes instructions may include or otherwise have access to computer readable media such as storage media, computer storage media, or data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by an application, module, or both. Any such computer storage media may be part of the system, any component of or related thereto, etc., or accessible or connectable thereto. Any application or module herein described may be implemented using computer readable/executable instructions that may be stored or otherwise held by such computer readable media.

[0072] The steps or operations in the flow charts and diagrams described herein are just for example. There may be many variations to these steps or operations without departing from the principles discussed above. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified.

[0073] Although the above principles have been described with reference to certain specific examples, various modifications thereof will be apparent to those skilled in the art as outlined in the appended claims.