Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD
Document Type and Number:
WIPO Patent Application WO/2018/110305
Kind Code:
A1
Abstract:
The technology of the present invention relates to an information processing device and information processing method that make it possible to realize various variations of various event scenes in a simulator environment that simulates the real world. A reward provision unit provides rewards to a first agent and a second agent that act in a simulator environment which simulates the real world and that learn action determination rules in accordance with rewards in respect to the act. A reward in accordance with a prescribed reward definition is provided to the first agent. In addition, a reward in accordance with an opposing reward definition is provided to the second agent, the opposing reward definition being in opposition to the prescribed reward definition such that the obtained reward becomes larger when the second agent acts to create a situation in which the reward of the first agent becomes smaller, and the obtained reward becomes smaller when the second agent acts such that the reward of the first agent becomes larger. The technology can be applied to reinforcement learning for an agent, for example.

Inventors:
SUZUKI HIROTAKA (JP)
NARIHIRA TAKUYA (JP)
OSATO AKIHITO (JP)
NAKADA KENTO (JP)
Application Number:
PCT/JP2017/043163
Publication Date:
June 21, 2018
Filing Date:
November 30, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SONY CORP (JP)
International Classes:
G06N99/00
Foreign References:
JP2011022902A2011-02-03
Other References:
ITO, AKIRA ET AL.: "The acquisition of the strategy to ''read others", A PROPOSAL OF A STANDARD PROBLEM. THE 17TH ANNUAL CONFERENCE OF THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, 2003, pages 1 - 4, XP009515666, DOI: 10.11517/pjsai.JSAI03.0.208.0
EGUCHI, TORU ET AL.: "A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment", IEEJ TRANSACTIONS ON ELECTRONICS, INFORMATION AND SYSTEMS, vol. 129, no. 7, 1 July 2009 (2009-07-01), pages 1253 - 1263, XP009515210, DOI: 10.1541/ieejeiss.129.1253
MNIHVOLODYMYR ET AL.: "Human-level control through deep reinforcement learning", NATURE, vol. 518, no. 7540, 2015, pages 529 - 533
See also references of EP 3557493A4
Attorney, Agent or Firm:
NISHIKAWA Takashi et al. (JP)
Download PDF: