LEARNING DEVICE AND LEARNING METHOD

Title:

LEARNING DEVICE AND LEARNING METHOD

Document Type and Number:

WIPO Patent Application WO/2018/123606

Kind Code:

A1

Abstract:

The present disclosure relates to a learning device and learning method with which it is possible to easily correct a reinforcement learning model on the basis of a user input. A display control unit causes a display unit to display reinforcement learning model information which relates to a reinforcement learning model. A correction unit corrects the reinforcement learning model on the basis of an input from a user regarding the reinforcement learning model information. The present disclosure may be applied to, for example, a personal computer (PC) which corrects a reinforcement learning model on the basis of an input from a user and which learns, by reinforcement learning, a movement policy of an agent using the corrected reinforcement learning model.

Inventors:

NAKADA KENTO (JP)
NARIHIRA TAKUYA (JP)
SUZUKI HIROTAKA (JP)
OSATO AKIHITO (JP)

Application Number:

PCT/JP2017/044839

Publication Date:

July 05, 2018

Filing Date:

December 14, 2017

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SONY CORP (JP)

International Classes:

G06N99/00

Foreign References:

JP2013030278A

2013-02-07

Other References:

NOGAWA, HIROSHI ET AL.: "F-045 Learning the meaning of instructions using autonomous action learning", INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS IEICE, INFORMATION PROCESSING SOCIETY OF JAPAN IPSJ. FIT 2004, vol. 3, no. 2, 20 August 2004 (2004-08-20), pages 319 - 321, XP009515429
TAMARU, JUNKI ET AL.: "Inverse reinforcement learning to estimate a time-dependent reward function from cyclic sequences of states", THE PAPERS OF TECHNICAL MEETING ON INTELLIGENT TRANSPORT SYSTEMS, vol. ST-13, 24 November 2013 (2013-11-24), pages 7 - 12, XP009515433
BRIAN D. ZIEBARTANDREW MAASJ.ANDREW BAGNELLANIND K. DEY: "Maximum Entropy Inverse Reinforcement Learning", 13 July 2008, ASSOCIATION FOR THE ADVANCEMENT OF ARTIFICIAL INTELLIGENCE (AAAI
RIAD AKROURMARC SCHOENAUERMICH'ELE SEBAG: "APRIL: Active Preference-learning based Reinforcement Learning", EUROPEAN CONFERENCE, ECML PKDD 2012, BRISTOL, UK, pages 20120924
See also references of EP 3561740A4

Attorney, Agent or Firm:

NISHIKAWA Takashi et al. (JP)

Download PDF:

View/Download PDF PDF Help

Previous Patent: MEDICAL ADHESIVE

Next Patent: UPWARD-FACING MARKER, IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND PROGRAM