Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
方策改善方法、方策改善プログラム、および方策改善装置
Document Type and Number:
Japanese Patent JP7188194
Kind Code:
B2
Abstract:
A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.

Inventors:
Tomotake Sasaki
Application Number:
JP2019041997A
Publication Date:
December 13, 2022
Filing Date:
March 07, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
富士通株式会社
International Classes:
G06N20/00; G05B13/02; G05B13/04
Domestic Patent References:
JP201953593A
Foreign References:
WO2019005206A1
US20200210575
Other References:
SASAKI, Tomotake, et al.,Policy gradient reinforcement learning method for discrete-time linear quadratic regulation problem using estimated state value function,2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)[online], [検索日:2022.10.18],pp.653-657,米国,IEEE,2017年11月13日,インターネット:,
Attorney, Agent or Firm:
Akinori Sakai