Title:
方策改善方法、方策改善プログラム、および方策改善装置
Document Type and Number:
Japanese Patent JP7188194
Kind Code:
B2
Abstract:
A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.
More Like This:
WO/2023/181243 | MODEL ANALYSIS DEVICE, MODEL ANALYSIS METHOD, AND RECORDING MEDIUM |
WO/2022/147410 | MACHINE LEARNING TRAINING BASED ON DUAL LOSS FUNCTIONS |
WO/2016/091148 | USER ACTION DATA PROCESSING METHOD AND DEVICE |
Inventors:
Tomotake Sasaki
Application Number:
JP2019041997A
Publication Date:
December 13, 2022
Filing Date:
March 07, 2019
Export Citation:
Assignee:
富士通株式会社
International Classes:
G06N20/00; G05B13/02; G05B13/04
Domestic Patent References:
JP201953593A |
Foreign References:
WO2019005206A1 | ||||
US20200210575 |
Other References:
SASAKI, Tomotake, et al.,Policy gradient reinforcement learning method for discrete-time linear quadratic regulation problem using estimated state value function,2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)[online], [検索日:2022.10.18],pp.653-657,米国,IEEE,2017年11月13日,インターネット:,
Attorney, Agent or Firm:
Akinori Sakai
Previous Patent: Heat exchanger
Next Patent: PREPARATION OF HETEROCYCLIC INTERMEDIATE FOR PREPARING OPTICALLY ACTIVE ARYLOXY-SUBSTITUTED VICINAL ...
Next Patent: PREPARATION OF HETEROCYCLIC INTERMEDIATE FOR PREPARING OPTICALLY ACTIVE ARYLOXY-SUBSTITUTED VICINAL ...