The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Nov. 30, 2021

Filed:

Sep. 15, 2017
Applicant:

X Development Llc, Mountain View, CA (US);

Inventors:

Mrinal Kalakrishnan, Palo Alto, CA (US);

Ali Hamid Yahya Valdovinos, Palo Alto, CA (US);

Adrian Ling Hin Li, San Francisco, CA (US);

Yevgen Chebotar, Los Angeles, CA (US);

Sergey Vladimir Levine, Berkeley, CA (US);

Assignee:

X Development LLC, Mountain View, CA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06N 3/08 (2006.01); G06N 20/00 (2019.01); B25J 9/16 (2006.01);
U.S. Cl.
CPC ...
G06N 3/08 (2013.01); G06N 20/00 (2019.01); B25J 9/161 (2013.01); B25J 9/163 (2013.01);
Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, of training a global policy neural network. One of the methods includes initializing an instance of the robotic task for multiple local workers, generating a trajectory of state-action pairs by selecting actions to be performed by the robotic agent while performing the instance of the robotic task, optimizing a local policy controller on the trajectory, generating an optimized trajectory using the optimized local controller, and storing the optimized trajectory in a replay memory associated with the local worker. The method includes sampling, for multiple global workers, an optimized trajectory from one of one or more replay memories associated with the global worker, and training the replica of the global policy neural network maintained by the global worker on the sampled optimized trajectory to determine delta values for the parameters of the global policy neural network.


Find Patent Forward Citations

Loading…