The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 04, 2025

Filed:

Dec. 01, 2023
Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Sergey Levine, Berkeley, CA (US);

Ethan Holly, San Francisco, CA (US);

Shixiang Gu, Mountain View, CA (US);

Timothy Lillicrap, London, GB;

Assignee:

GOOGLE LLC, Mountain View, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/00 (2019.01); B25J 9/16 (2006.01); G05B 13/02 (2006.01); G05B 19/042 (2006.01); G06N 3/008 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01);
U.S. Cl.
CPC ...
B25J 9/161 (2013.01); B25J 9/163 (2013.01); B25J 9/1664 (2013.01); G05B 13/027 (2013.01); G05B 19/042 (2013.01); G06N 3/008 (2013.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G05B 2219/32335 (2013.01); G05B 2219/33033 (2013.01); G05B 2219/33034 (2013.01); G05B 2219/39001 (2013.01); G05B 2219/39298 (2013.01); G05B 2219/40499 (2013.01);
Abstract

Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.


Find Patent Forward Citations

Loading…