The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11493926 B1

Date of Patent:

Nov. 08, 2022

Filed:

May. 15, 2019

Offline agent using reinforcement learning to speedup trajectory planning for autonomous vehicles

Applicant:

Baidu Usa Llc, Sunnyvale, CA (US);

Inventors:

Runxin He, Sunnyvale, CA (US);

Jinyun Zhou, Sunnyvale, CA (US);

Qi Luo, Sunnyvale, CA (US);

Shiyu Song, Sunnyvale, CA (US);

Jinghao Miao, Sunnyvale, CA (US);

Jiangtao Hu, Sunnyvale, CA (US);

Yu Wang, Sunnyvale, CA (US);

Jiaxuan Xu, Sunnyvale, CA (US);

Shu Jiang, Sunnyvale, CA (US);

Assignee:

BAIDU USA LLC, Sunnyvale, CA (US);

Attorney:

Womble Bond Dickinson (US) LLP

Primary Examiner:

Babar Sarwar

Int. Cl.

CPC ...

G05D 1/02 (2020.01); G06N 3/08 (2006.01); G06N 3/04 (2006.01);

U.S. Cl.

CPC ...

G05D 1/0221 (2013.01); G05D 1/0217 (2013.01); G06N 3/0454 (2013.01); G06N 3/088 (2013.01); G05D 2201/0213 (2013.01);

Abstract

In one embodiment, a system generates a plurality of driving scenarios to train a reinforcement learning (RL) agent and replays each of the driving scenarios to train the RL agent by: applying a RL algorithm to an initial state of a driving scenario to determine a number of control actions from a number of discretized control/action options for the ADV to advance to a number of trajectory states which are based on a number of discretized trajectory state options, determining a reward prediction by the RL algorithm for each of the controls/actions, determining a judgment score for the trajectory states, and updating the RL agent based on the judgment score.

Find Patent Forward Citations