Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
INVERSE REINFORCEMENT LEARNING BY DENSITY RATIO ESTIMATION
Document Type and Number:
WIPO Patent Application WO/2016/021210
Kind Code:
A1
Abstract:
A method of inverse reinforcement learning for estimating cost and value functions of behaviors of a subject includes acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: q(x)+gV(y)-V(x)=-ln{pi(y|x))/(p(y|x)} (1) where q(x) and V(x) denote a cost function and a value function, respectively, at state x, g represents a discount factor, and p(y|x) and pi(y|x) denote state transition probabilities before and after learning, respectively; estimating a density ratio pi(y|x) / p(y|x) in Eq. (1); estimating q(x) and V(x) in Eq. (1) using the least square method in accordance with the estimated density ratio pi(y|x) / p(y|x), and outputting the estimated q(x) and V(x).

Inventors:
UCHIBE EIJI (JP)
DOYA KENJI (JP)
Application Number:
PCT/JP2015/004001
Publication Date:
February 11, 2016
Filing Date:
August 07, 2015
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
OKINAWA INST OF SCIENCE AND TECHNOLOGY SCHOOL CORP (JP)
International Classes:
G06N20/00
Other References:
DVIJOTHAM, KRISHNAMURTHY ET AL.: "Inverse Optimal Control with Linearly-Solvable MDPs", 2010, XP055392918, Retrieved from the Internet [retrieved on 20151022]
SUGIYAMA, MASASHI ET AL.: "A Density-ratio Framework for Statistical Data Processing", IPSJ TRANSACTIONS ON COMPUTER VISION AND APPLICATIONS, vol. 1, 2009, pages 183 - 208, XP055394349, ISSN: 1882-6695
See also references of EP 3178040A4
Attorney, Agent or Firm:
KATAYAMA, Shuhei (6-1 Kyobashi 1-chome, Chuo-k, Tokyo 31, JP)
Download PDF: