Title:
人の走行データをトレーニングデータとして利用して、強化学習を支援することによりカスタマイズ型経路プランニングを遂行する学習方法及び学習装置
Document Type and Number:
Japanese Patent JP6931937
Kind Code:
B2
Abstract:
A learning method for acquiring at least one personalized reward function, used for performing a Reinforcement Learning(RL) algorithm, corresponding to a personalized optimal policy for a subject driver is provided. And the method includes steps of: (a) a learning device performing a process of instructing an adjustment reward network to generate first adjustment rewards, by referring to the information on actual actions and actual circumstance vectors in driving trajectories, a process of instructing a common reward module to generate first common rewards by referring to the actual actions and the actual circumstance vectors, and a process of instructing an estimation network to generate actual prospective values by referring to the actual circumstance vectors; and (b) the learning device instructing a first loss layer to generate an adjustment reward and to perform backpropagation to learn parameters of the adjustment reward network.
Inventors:
Ken Katsura
Kim
Gold crane
Southern cloud
Husband Shou
Akira Satoshi
Shinto Shou
Lu Dongyuan
Willow universe
Lee Mingchun
Lee
Yasuo Zhang
Chung
Various models
Zhao Kotatsu
Kim
Gold crane
Southern cloud
Husband Shou
Akira Satoshi
Shinto Shou
Lu Dongyuan
Willow universe
Lee Mingchun
Lee
Yasuo Zhang
Chung
Various models
Zhao Kotatsu
Application Number:
JP2020011163A
Publication Date:
September 08, 2021
Filing Date:
January 27, 2020
Export Citation:
Assignee:
Stradvision,Inc.
International Classes:
G08G1/16; G01C21/34; G06N20/00; G08G1/00
Domestic Patent References:
JP2018135068A | ||||
JP2000122992A | ||||
JP2019073271A | ||||
JP2020506838A | ||||
JP2019534517A |
Foreign References:
US20200027560 | ||||
WO2018211140A1 |
Attorney, Agent or Firm:
Tadashige Ito
Tadahiko Ito
Shinsuke Onuki
Tadahiko Ito
Shinsuke Onuki