The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Sep. 15, 2020
Filed:
Nov. 19, 2019
Deepmind Technologies Limited, London, GB;
Daniel Pieter Wierstra, London, GB;
Yujia Li, London, GB;
Razvan Pascanu, Letchworth Garden, GB;
Peter William Battaglia, London, GB;
Theophane Guillaume Weber, London, GB;
Lars Buesing, London, GB;
David Paul Reichert, London, GB;
Arthur Clement Guez, London, GB;
Danilo Jimenez Rezende, London, GB;
Adrià Puigdomènech Badia, London, GB;
Oriol Vinyals, London, GB;
Nicolas Manfred Otto Heess, London, GB;
Sebastien Henri Andre Racaniere, London, GB;
DeepMind Technologies Limited, London, GB;
Abstract
A neural network system is proposed. The neural network can be trained by model-based reinforcement learning to select actions to be performed by an agent interacting with an environment, to perform a task in an attempt to achieve a specified result. The system may comprise at least one imagination core which receives a current observation characterizing a current state of the environment, and optionally historical observations, and which includes a model of the environment. The imagination core may be configured to output trajectory data in response to the current observation, and/or historical observations. The trajectory data comprising a sequence of future features of the environment imagined by the imagination core. The system may also include a rollout encoder to encode the features, and an output stage to receive data derived from the rollout embedding and to output action policy data for identifying an action based on the current observation.