PRECISION HYGIENE USING REINFORCEMENT LEARNING

Title:

PRECISION HYGIENE USING REINFORCEMENT LEARNING

Document Type and Number:

WIPO Patent Application WO/2021/009574

Kind Code:

A1

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning in the field of hygiene. Specifically, the features described relate to selecting actions in a context to be performed by an agent that interacts with an environment by receiving observations, in response, performing actions from a set of actions, wherein the context comprises a bacterial product to be prescribed, wherein the observations comprise data from the mapping of applied bacteria, and wherein the actions comprise data for the prescription of the bacterial product.

More Like This:

WO/2013/062984	METHOD FOR TREATMENT OF MESOTHELIOMA
JP7381139	Programs, computer equipment and methods
WO/2022/023481	PRE-ANALYTIC MANAGEMENT OF SAMPLE CONTAINER PARAMETERS

Inventors:

GRABMAIER OLIVIA (DE)

Application Number:

PCT/IB2020/052985

Publication Date:

January 21, 2021

Filing Date:

March 29, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

GRABMAIER OLIVIA (DE)

International Classes:

G16H10/40

Domestic Patent References:

WO2017156031A1

2017-09-14

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for selecting actions to be performed by an agent that interacts with an environment by receiving observations, in response, performing actions from a set of actions, the operations comprising:

receiving, by an actor of one or more actors, an observation, wherein each actor

executes on a respective computing unit, wherein each actor is configured to operate on a surface part distinct from operations of each other actor, wherein each actor receives, periodically, a context, wherein the context comprises a bacterial product to be prescribed on the surface part, wherein the observation comprises data from the mapping of applied bacteria on the surface part;

selecting, by the actor, an action to be performed by the agent, wherein the action comprises data for the prescription of the bacterial product on the surface part; and

receiving, by the actor, a next observation in response to the action being performed.

2. The system of claim 1, the operations further comprising:

learning, by a learner of one or more learners, a strategy for selecting actions, wherein each learner executes on a respective computing unit, wherein each learner is configured to operate on a surface class distinct from operations of each other learner, wherein the surface class comprises one or more surface parts, wherein each learner interacts with one or more of the actors, wherein each of the one or more of the actors is configured to operate on one of the surface parts of the surface class on which the learner is configured to operate.

3. The system of claim 2, the operations further comprising:

selecting, periodically, by a p-actor of one or more p-actors, a context, wherein each p- actor executes on a respective computing unit, wherein each p-actor is configured to operate on a surface class distinct from operations of each other p-actor.

4. The system of claim 3, the operations further comprising:

learning, by a p-learner of one or more p-learners, a strategy for selecting contexts, wherein each p-learner executes on a respective computing unit, wherein each p- learner is configured to operate on a surface group distinct from operations of each other p-learner, wherein the surface group comprises one or more surface classes, wherein each p-learner interacts with one or more of the p-actors, wherein each of the one or more of the p-actors is configured to operate on one of the surface classes of the surface group on which the p-learner is configured to operate.

5. The system of claim 1, wherein one or more of the actors are further configured to perform operations, the operations comprising:

receiving an action preference; and

processing, prior to selecting the action, the action preference.

6. The system of claim 3, wherein one or more of the p-actors are further configured to perform operations, the operations comprising:

receiving a context preference; and

processing, prior to selecting the context, the context preference.

7. The system of claim 5, wherein receiving an action preference comprises:

receiving an action preference signal from one or more physical entities, the physical entities comprising computer-implemented decision support tools, wherein the computer-implemented decision support tools are one or more of a smart device, an loT device, and a computer device configured to execute one or more programs with at least one user interface for the action preference input.

8. The system of claim 6, wherein receiving a context preference comprises:

receiving a context preference signal from one or more physical entities, the physical entities comprising computer-implemented decision support tools, wherein the computer-implemented decision support tools are one or more of a smart device, an loT device, and a computer device configured to execute one or more programs with at least one user interface for the context preference input.

9. The system of claim 1, wherein receiving an observation comprises:

receiving an observation signal from one or more physical entities, the physical entities comprising mapping tools, wherein the mapping tools are one or more of an unmanned aerial vehicle, a smart device, an loT device, and a robotic device configured to be used for the mapping of applied bacteria on the surface part.

10. The system of claim 1, wherein selecting an action comprises:

sending an action signal to one or more physical entities, the physical entities

comprising prescription tools, wherein the prescription tools are one or more of an unmanned aerial vehicle, a head-mounted display, and a robotic device configured to be used for the prescription of the bacterial product on the surface part.

11. A method for selecting actions to be performed by an agent that interacts with an environment by receiving observations, in response, performing actions from a set of actions, the method comprising:

receiving, by an actor of one or more actors, an observation, wherein each actor

executes on a respective computing unit, wherein each actor is configured to operate on a surface part distinct from operations of each other actor, wherein each actor receives, periodically, a context, wherein the context comprises a bacterial product to be prescribed on the surface part, wherein the observation comprises data from the mapping of applied bacteria on the surface part;

selecting, by the actor, an action to be performed by the agent, wherein the action

comprises data for the prescription of the bacterial product on the surface part; and

receiving, by the actor, a next observation in response to the action being performed.

12. The method of claim 11, further comprising:

learning, by a learner of one or more learners, a strategy for selecting actions, wherein each learner executes on a respective computing unit, wherein each learner is configured to operate on a surface class distinct from operations of each other learner, wherein the surface class comprises one or more surface parts, wherein each learner interacts with one or more of the actors, wherein each of the one or more of the actors is configured to operate on one of the surface parts of the surface class on which the learner is configured to operate.

13. The method of claim 12, further comprising:

selecting, periodically, by a p-actor of one or more p-actors, a context, wherein each p- actor executes on a respective computing unit, wherein each p-actor is configured to operate on a surface class distinct from operations of each other p-actor.

14. The method of claim 13, further comprising:

learning, by a p-learner of one or more p-learners, a strategy for selecting contexts, wherein each p-learner executes on a respective computing unit, wherein each p- learner is configured to operate on a surface group distinct from operations of each other p-learner, wherein the surface group comprises one or more surface classes, wherein each p-learner interacts with one or more of the p-actors, wherein each of the one or more of the p-actors is configured to operate on one of the surface classes of the surface group on which the p-learner is configured to operate.

15. One or more computer storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform operations for selecting actions to be performed by an agent that interacts with an environment by receiving observations, in response, performing actions from a set of actions, the operations

comprising:

receiving, by an actor of one or more actors, an observation, wherein each actor

executes on a respective computing unit, wherein each actor is configured to operate on a surface part distinct from operations of each other actor, wherein each actor receives, periodically, a context, wherein the context comprises a bacterial product to be prescribed on the surface part, wherein the observation comprises data from the mapping of applied bacteria on the surface part;

selecting, by the actor, an action to be performed by the agent, wherein the action comprises data for the prescription of the bacterial product on the surface part; and

receiving, by the actor, a next observation in response to the action being performed.

16. The computer storage media of claim 15, the operations further comprising:

learning, by a learner of one or more learners, a strategy for selecting actions, wherein each learner executes on a respective computing unit, wherein each learner is configured to operate on a surface class distinct from operations of each other learner, wherein the surface class comprises one or more surface parts, wherein each learner interacts with one or more of the actors, wherein each of the one or more of the actors is configured to operate on one of the surface parts of the surface class on which the learner is configured to operate.

17. A method for federated learning a global strategy for selecting actions, wherein the actions comprise data for the prescription of a bacterial product, the method comprising: receiving, by a federated learner of a plurality of federated learners, a global strategy from a federator;

determining, by the federated learner, a local update to the global strategy; and sending, by the federated learner, the local update to the federator.

18. The method of claim 17, further comprising:

receiving, by the federator, one or more local updates from the plurality of federated learners, wherein each federated learner is configured to operate on a local surface class of the global surface class on which the federator is configured to operate; and

determining, by the federator, a global update to the global strategy. 19. The method of claim 18, wherein the global strategy is a neural network that is configured to receive an input comprising an observation, wherein the observation comprises data from the mapping of applied bacteria, and to generate a neural network output from the input in accordance with a set of parameters, wherein federated learning the global strategy comprises updating the values of the set of parameters of the neural network.

20. The method of claim 19, wherein updating the values of the set of parameters of the neural network comprises determining, by the federated learner, the local update to the global strategy based at least in part on a gradient vector, wherein the gradient vector is determined by performing one or more iterations of one or more gradient training techniques on the global strategy with respect to local experience data.

AMENDED CLAIMS

received by the International Bureau on 04 November 2020 (04.11.2020)