DIRECT INVERSE REINFORCEMENT LEARNING WITH DENSITY RATIO ESTIMATION

Title:

DIRECT INVERSE REINFORCEMENT LEARNING WITH DENSITY RATIO ESTIMATION

Document Type and Number:

WIPO Patent Application WO/2017/159126

Kind Code:

A1

Abstract:

A method of inverse reinforcement learning for estimating reward and value functions of behaviors of a subject includes: acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: where r(x) and V(x) denote a reward function and a value function, respectively, at state x, and γ represents a discount factor, and b(y | x) and π(y | x) denote state transition probabilities before and after learning, respectively; estimating a logarithm of the density ratio π(x)/b(x) in Eq. (2); estimating r(x) and V(x) in Eq. (2) from the result of estimating a log of the density ratio π(x, y)/b(x, y); and outputting the estimated r(x) and V(x).

Inventors:

UCHIBE EIJI (JP)
DOYA KENJI (JP)

Application Number:

PCT/JP2017/004463

Publication Date:

September 21, 2017

Filing Date:

February 07, 2017

Export Citation:

Click for automatic bibliography generation Help

Assignee:

OKINAWA INST SCIENCE & TECH SCHOOL CORP (JP)

International Classes:

G06N20/00

Domestic Patent References:

WO2016021210A1

2016-02-11

Other References: