人の走行データをトレーニングデータとして利用して、強化学習を支援することによりカスタマイズ型経路プランニングを遂行する学習方法及び学習装置

Title:

Document Type and Number:

Japanese Patent JP6931937

Kind Code:

B2

Abstract:

A learning method for acquiring at least one personalized reward function, used for performing a Reinforcement Learning(RL) algorithm, corresponding to a personalized optimal policy for a subject driver is provided. And the method includes steps of: (a) a learning device performing a process of instructing an adjustment reward network to generate first adjustment rewards, by referring to the information on actual actions and actual circumstance vectors in driving trajectories, a process of instructing a common reward module to generate first common rewards by referring to the actual actions and the actual circumstance vectors, and a process of instructing an estimation network to generate actual prospective values by referring to the actual circumstance vectors; and (b) the learning device instructing a first loss layer to generate an adjustment reward and to perform backpropagation to learn parameters of the adjustment reward network.

Inventors:

Ken Katsura
Kim
Gold crane
Southern cloud
Husband Shou
Akira Satoshi
Shinto Shou
Lu Dongyuan
Willow universe
Lee Mingchun
Lee
Yasuo Zhang
Chung
Various models
Zhao Kotatsu

Application Number:

JP2020011163A

Publication Date:

September 08, 2021

Filing Date:

January 27, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

Stradvision,Inc.

International Classes:

G08G1/16; G01C21/34; G06N20/00; G08G1/00

Domestic Patent References:

JP2018135068A
JP2000122992A
JP2019073271A
JP2020506838A
JP2019534517A

Foreign References:

US20200027560
WO2018211140A1

Attorney, Agent or Firm:

Tadashige Ito
Tadahiko Ito
Shinsuke Onuki

Previous Patent: 装置およびプログラム

Next Patent: 食品の常温乾燥装置