Title:
時間的価値移送を使用した長いタイムスケールにわたるエージェントの制御
Document Type and Number:
Japanese Patent JP7139524
Kind Code:
B2
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.
More Like This:
Inventors:
Gregory Duncan Wayne
Timothy Paul Lilyclap
Cheerleader-Chun Han
Joshua Simon Abramson
Timothy Paul Lilyclap
Cheerleader-Chun Han
Joshua Simon Abramson
Application Number:
JP2021519878A
Publication Date:
September 20, 2022
Filing Date:
October 14, 2019
Export Citation:
Assignee:
Deep Mind Technologies Limited
International Classes:
G06N3/08; G06N20/00; G06V10/764; G06V10/774
Domestic Patent References:
JP2004068399A1 | ||||
JP2018083238A | ||||
JP2018525759A |
Foreign References:
US20150100530 | ||||
US20170032245 |
Attorney, Agent or Firm:
Murayama Yasuhiko
Shinya Mihiro
Tatsuhiko Abe
Shinya Mihiro
Tatsuhiko Abe
Previous Patent: Dual connection switching method, terminal and network equipment
Next Patent: automatic analyzer
Next Patent: automatic analyzer