方策改善方法、方策改善プログラム、および方策改善装置

Title:

方策改善方法、方策改善プログラム、および方策改善装置

Document Type and Number:

Japanese Patent JP7188194

Kind Code:

B2

Abstract:

A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.

More Like This:

WO/2023/181243	MODEL ANALYSIS DEVICE, MODEL ANALYSIS METHOD, AND RECORDING MEDIUM
WO/2022/147410	MACHINE LEARNING TRAINING BASED ON DUAL LOSS FUNCTIONS
WO/2016/091148	USER ACTION DATA PROCESSING METHOD AND DEVICE

Inventors:

Tomotake Sasaki

Application Number:

JP2019041997A

Publication Date:

December 13, 2022

Filing Date:

March 07, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

富士通株式会社

International Classes:

G06N20/00; G05B13/02; G05B13/04

Domestic Patent References:

JP201953593A

Foreign References:

WO2019005206A1
US20200210575

Other References:

SASAKI, Tomotake, et al.,Policy gradient reinforcement learning method for discrete-time linear quadratic regulation problem using estimated state value function,2017 56th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)[online], [検索日:2022.10.18],pp.653-657,米国,IEEE,2017年11月13日,インターネット:,

Attorney, Agent or Firm:

Akinori Sakai

Previous Patent: Heat exchanger

Next Patent: PREPARATION OF HETEROCYCLIC INTERMEDIATE FOR PREPARING OPTICALLY ACTIVE ARYLOXY-SUBSTITUTED VICINAL ...