INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

Title:

INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

Document Type and Number:

WIPO Patent Application WO/2018/110305

Kind Code:

A1

Abstract:

The technology of the present invention relates to an information processing device and information processing method that make it possible to realize various variations of various event scenes in a simulator environment that simulates the real world. A reward provision unit provides rewards to a first agent and a second agent that act in a simulator environment which simulates the real world and that learn action determination rules in accordance with rewards in respect to the act. A reward in accordance with a prescribed reward definition is provided to the first agent. In addition, a reward in accordance with an opposing reward definition is provided to the second agent, the opposing reward definition being in opposition to the prescribed reward definition such that the obtained reward becomes larger when the second agent acts to create a situation in which the reward of the first agent becomes smaller, and the obtained reward becomes smaller when the second agent acts such that the reward of the first agent becomes larger. The technology can be applied to reinforcement learning for an agent, for example.

Inventors:

SUZUKI HIROTAKA (JP)
NARIHIRA TAKUYA (JP)
OSATO AKIHITO (JP)
NAKADA KENTO (JP)

Application Number:

PCT/JP2017/043163

Publication Date:

June 21, 2018

Filing Date:

November 30, 2017

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SONY CORP (JP)

International Classes:

G06N99/00

Foreign References:

JP2011022902A

2011-02-03

Other References:

ITO, AKIRA ET AL.: "The acquisition of the strategy to ''read others", A PROPOSAL OF A STANDARD PROBLEM. THE 17TH ANNUAL CONFERENCE OF THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, 2003, pages 1 - 4, XP009515666, DOI: 10.11517/pjsai.JSAI03.0.208.0
EGUCHI, TORU ET AL.: "A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment", IEEJ TRANSACTIONS ON ELECTRONICS, INFORMATION AND SYSTEMS, vol. 129, no. 7, 1 July 2009 (2009-07-01), pages 1253 - 1263, XP009515210, DOI: 10.1541/ieejeiss.129.1253
MNIHVOLODYMYR ET AL.: "Human-level control through deep reinforcement learning", NATURE, vol. 518, no. 7540, 2015, pages 529 - 533
See also references of EP 3557493A4

Attorney, Agent or Firm:

NISHIKAWA Takashi et al. (JP)

Download PDF:

View/Download PDF PDF Help

Previous Patent: SEMICONDUCTOR DEVICE AND PROTECTION ELEMENT

Next Patent: FLAPPING DEVICE