INVERSE REINFORCEMENT LEARNING BY DENSITY RATIO ESTIMATION - OKINAWA INST OF SCIENCE AND TECHNOLOGY SCHOOL CORP

Title:

INVERSE REINFORCEMENT LEARNING BY DENSITY RATIO ESTIMATION

Document Type and Number:

WIPO Patent Application WO/2016/021210

Kind Code:

A1

Abstract:

A method of inverse reinforcement learning for estimating cost and value functions of behaviors of a subject includes acquiring data representing changes in state variables that define the behaviors of the subject; applying a modified Bellman equation given by Eq. (1) to the acquired data: q(x)+gV(y)-V(x)=-ln{pi(y|x))/(p(y|x)} (1) where q(x) and V(x) denote a cost function and a value function, respectively, at state x, g represents a discount factor, and p(y|x) and pi(y|x) denote state transition probabilities before and after learning, respectively; estimating a density ratio pi(y|x) / p(y|x) in Eq. (1); estimating q(x) and V(x) in Eq. (1) using the least square method in accordance with the estimated density ratio pi(y|x) / p(y|x), and outputting the estimated q(x) and V(x).

More Like This:

WO/2023/188286	TRAINING DEVICE, ESTIMATION DEVICE, TRAINING METHOD, AND RECORDING MEDIUM
JP7429514	machine learning device
JP2022182844	RADIO WAVE ABNORMALITY DETECTION SYSTEM, RADIO WAVE ABNORMALITY DETECTION METHOD, AND RADIO WAVE ABNORMALITY DETECTION PROGRAM

Inventors:

UCHIBE EIJI (JP)
DOYA KENJI (JP)

Application Number:

PCT/JP2015/004001

Publication Date:

February 11, 2016

Filing Date:

August 07, 2015

Export Citation:

Click for automatic bibliography generation Help

Assignee:

OKINAWA INST OF SCIENCE AND TECHNOLOGY SCHOOL CORP (JP)

International Classes:

G06N20/00

Other References:

DVIJOTHAM, KRISHNAMURTHY ET AL.: "Inverse Optimal Control with Linearly-Solvable MDPs", 2010, XP055392918, Retrieved from the Internet [retrieved on 20151022]
SUGIYAMA, MASASHI ET AL.: "A Density-ratio Framework for Statistical Data Processing", IPSJ TRANSACTIONS ON COMPUTER VISION AND APPLICATIONS, vol. 1, 2009, pages 183 - 208, XP055394349, ISSN: 1882-6695
See also references of EP 3178040A4

Attorney, Agent or Firm:

KATAYAMA, Shuhei (6-1 Kyobashi 1-chome, Chuo-k, Tokyo 31, JP)

Download PDF:

View/Download PDF PDF Help

Previous Patent: VIRUS LIKE PARTICLE COMPRISING MODIFIED ENVELOPE PROTEIN E3

Next Patent: TREATMENT AGENT FOR SYNTHETIC FIBERS AND USE THEREOF