時間的価値移送を使用した長いタイムスケールにわたるエージェントの制御

Title:

時間的価値移送を使用した長いタイムスケールにわたるエージェントの制御

Document Type and Number:

Japanese Patent JP7139524

Kind Code:

B2

Abstract:

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network system used to control an agent interacting with an environment to perform a specified task. One of the methods includes causing the agent to perform a task episode in which the agent attempts to perform the specified task; for each of one or more particular time steps in the sequence: generating a modified reward for the particular time step from (i) the actual reward at the time step and (ii) value predictions at one or more time steps that are more than a threshold number of time steps after the particular time step in the sequence; and training, through reinforcement learning, the neural network system using at least the modified rewards for the particular time steps.

More Like This:

WO/2024/066618	METHOD AND SYSTEM FOR TRAINING LARGE-SCALE LANGUAGE MODELS
WO/1990/003006	LEARNING PROCESSING SYSTEM OF NETWORK STRUCTURE DATA PROCESSING UNIT
WO/2023/277663	IMAGE PROCESSING METHOD USING ARTIFICIAL NEURAL NETWORK, AND NEURAL PROCESSING UNIT

Inventors:

Gregory Duncan Wayne
Timothy Paul Lilyclap
Cheerleader-Chun Han
Joshua Simon Abramson

Application Number:

JP2021519878A

Publication Date:

September 20, 2022

Filing Date:

October 14, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

Deep Mind Technologies Limited

International Classes:

G06N3/08; G06N20/00; G06V10/764; G06V10/774

Domestic Patent References:

JP2004068399A1
JP2018083238A
JP2018525759A

Foreign References:

US20150100530
US20170032245

Attorney, Agent or Firm:

Murayama Yasuhiko
Shinya Mihiro
Tatsuhiko Abe

Previous Patent: Dual connection switching method, terminal and network equipment

Next Patent: automatic analyzer