The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G06N 20/00 (2019.01); G06F 18/20 (2023.01); G06F 18/21 (2023.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01); H04W 28/02 (2009.01); H04W 28/08 (2023.01);

U.S. Cl.

CPC ...

G06N 20/00 (2019.01); G06F 18/217 (2023.01); G06F 18/295 (2023.01); H04W 28/0284 (2013.01); H04W 28/08 (2013.01); H04W 28/0925 (2020.05);

Abstract

Rapid and data-efficient training of an artificial intelligence (AI) algorithm are disclosed. Ground truth data are not available and a policy must be learned based on limited interactions with a system. A policy bank is used to explore different policies on a target system with shallow probing. A target policy is chosen by comparing a good policy from the shallow probing with a base target policy which has evolved over other learning experiences. The target policy then interacts with the target system and a replay buffer is built up. The base target policy is then updated using gradients found with respect to the transition experience stored in the replay buffer. The base target policy is quickly learned and is robust for application to new, unseen, systems.

Find Patent Forward Citations