Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
REINFORCEMENT LEARNING METHOD FOR INCENTIVE POLICY BASED ON HISTORIC DATA TRAJECTORY CONSTRUCTION
Document Type and Number:
WIPO Patent Application WO/2020/248220
Kind Code:
A1
Abstract:
A system and method to optimize the distribution of incentives for a transportation hailing service is disclosed. A database stores state data and action data received from a client devices and transportation devices. The state data is associated with the utilization of the transportation hailing service and the action data is associated with different incentives to passengers to engage the transportation hailing service. A Q-value determination engine is trained to determine rewards associated with incentive actions from a set of virtual trajectories of states, incentive actions, and rewards, based on a history of the action data and associated state data from the database. Passengers are ordered according to V-values. An incentive policy including selected incentive actions based on the determined rewards and passengers is created. An incentive server communicates selected incentives to at least some of the client devices according to the determined incentive policy.

Inventors:
QU WEIYANG (US)
LI QINGYANG (US)
QIN ZHIWEI (US)
MENG YIPING (US)
YU YANG (US)
YE JIEPING (US)
Application Number:
PCT/CN2019/091247
Publication Date:
December 17, 2020
Filing Date:
June 14, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BEIJING DIDI INFINITY TECHNOLOGY & DEV CO LTD (CN)
QU WEIYANG (US)
International Classes:
G06Q10/04
Domestic Patent References:
WO2017120001A12017-07-13
Foreign References:
CN106709596A2017-05-24
CN107657499A2018-02-02
CN106920018A2017-07-04
US20150261754A12015-09-17
Attorney, Agent or Firm:
NTD PATENT AND TRADEMARK AGENCY LTD. (CN)
Download PDF: