Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DIRECT INVERSE REINFORCEMENT LEARNING WITH DENSITY RATIO ESTIMATION
Document Type and Number:
WIPO Patent Application WO/2017/159126
Kind Code:
A1
Abstract:
A method of inverse reinforcement learning for estimating reward and value functions of behaviors of a subject includes: acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: where r(x) and V(x) denote a reward function and a value function, respectively, at state x, and γ represents a discount factor, and b(y | x) and π(y | x) denote state transition probabilities before and after learning, respectively; estimating a logarithm of the density ratio π(x)/b(x) in Eq. (2); estimating r(x) and V(x) in Eq. (2) from the result of estimating a log of the density ratio π(x, y)/b(x, y); and outputting the estimated r(x) and V(x).

Inventors:
UCHIBE EIJI (JP)
DOYA KENJI (JP)
Application Number:
PCT/JP2017/004463
Publication Date:
September 21, 2017
Filing Date:
February 07, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OKINAWA INST SCIENCE & TECH SCHOOL CORP (JP)
International Classes:
G06N20/00
Domestic Patent References:
WO2016021210A12016-02-11
Other References:
See also references of EP 3430578A4
Attorney, Agent or Firm:
KATAYAMA, Shuhei (JP)
Download PDF: