The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Dec. 10, 2024

Filed:

Dec. 31, 2020
Applicant:

Samsung Electronics Co., Ltd., Suwon-si, KR;

Inventors:

Di Wu, Montreal, CA;

Jikun Kang, Montreal, CA;

Hang Li, Montreal, CA;

Xi Chen, Montreal, CA;

Yi Tian Xu, Montreal, CA;

Dmitriy Rivkin, Montreal, CA;

Taeseop Lee, Seoul, KR;

Intaik Park, Seoul, KR;

Michael Jenkin, Toronto, CA;

Xue Liu, Montreal, CA;

Gregory Lewis Dudek, Westmount, CA;

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06N 20/00 (2019.01); G06F 18/20 (2023.01); G06F 18/21 (2023.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01); H04W 28/02 (2009.01); H04W 28/08 (2023.01);
U.S. Cl.
CPC ...
G06N 20/00 (2019.01); G06F 18/217 (2023.01); G06F 18/295 (2023.01); H04W 28/0284 (2013.01); H04W 28/08 (2013.01); H04W 28/0925 (2020.05);
Abstract

Rapid and data-efficient training of an artificial intelligence (AI) algorithm are disclosed. Ground truth data are not available and a policy must be learned based on limited interactions with a system. A policy bank is used to explore different policies on a target system with shallow probing. A target policy is chosen by comparing a good policy from the shallow probing with a base target policy which has evolved over other learning experiences. The target policy then interacts with the target system and a replay buffer is built up. The base target policy is then updated using gradients found with respect to the transition experience stored in the replay buffer. The base target policy is quickly learned and is robust for application to new, unseen, systems.


Find Patent Forward Citations

Loading…