The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12430551 B1

Date of Patent:

Sep. 30, 2025

Filed:

Jun. 03, 2021

Reinforcement learning algorithm search

Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

John Dalton Co-Reyes, San Francisco, CA (US);

Yingjie Miao, Fremont, CA (US);

Daiyi Peng, Cupertino, CA (US);

Sergey Vladimir Levine, Berkeley, CA (US);

Quoc V. Le, Sunnyvale, CA (US);

Honglak Lee, Mountain View, CA (US);

Aleksandra Faust, Palo Alto, CA (US);

Assignee:

Google LLC, Mountain View, CA (US);

Attorney:

Fish & Richardson P.C.

Primary Examiner:

Brian M Smith

Int. Cl.

CPC ...

G06N 3/08 (2023.01); G06F 11/34 (2006.01); G06F 16/901 (2019.01);

U.S. Cl.

CPC ...

G06N 3/08 (2013.01); G06F 11/3428 (2013.01); G06F 16/9024 (2019.01);

Abstract

Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for generating and searching reinforcement learning algorithms. In some implementations, a computer-implemented system generates a sequence of candidate reinforcement learning algorithms. Each candidate reinforcement learning algorithm in the sequence is configured to receive an input environment state characterizing a state of an environment and to generate an output that specifies an action to be performed by an agent interacting with the environment. For each candidate reinforcement learning algorithm in the sequence, the system performs a performance evaluation for a set of a plurality of training environments. For each training environment, the system adjusts a set of environment-specific parameters of the candidate reinforcement learning algorithm by performing training of the candidate reinforcement learning algorithm to control a corresponding agent in the training environment. The system generates an environment-specific performance metric for the candidate reinforcement learning algorithm that measures a performance of the candidate reinforcement learning algorithm in controlling the corresponding agent in the training environment as a result of the training. After performing training in the set of training environments, the system generates a summary performance metric for the candidate reinforcement learning algorithm by combining the environment-specific performance metrics generated for the set of training environments. After evaluating each of the candidate reinforcement learning algorithms in the sequence, the system selects one or more output reinforcement learning algorithms from the sequence based on the summary performance metrics of the candidate reinforcement learning algorithms.

Find Patent Forward Citations