The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G06N 20/00 (2019.01); G06F 18/214 (2023.01); G06F 30/20 (2020.01); G06N 3/006 (2023.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01);

U.S. Cl.

CPC ...

G06N 3/006 (2013.01); G06F 18/2148 (2023.01); G06F 30/20 (2020.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01);

Abstract

Methods, systems, and apparatus for providing a sequence of actions to perform a task. In one aspect, a method comprises: using a policy neural network to, at each of a sequence of time steps, select one or more actions to be performed according to an action selection policy learned by the policy neural network; providing the selected one or more actions to a simulator; implementing the selected one or more actions for the time steps using the simulator to generate a simulator output; discriminating between the simulator output and training data using a discriminator neural network to produce a discriminator output; and updating parameters of the policy recurrent neural network using a reinforcement learning procedure according to a reward signal determined from the discriminator output; and updating parameters of the discriminator neural network according to a difference between the simulator output and the training data.

Find Patent Forward Citations