The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11645498 B1

Date of Patent:

May. 09, 2023

Filed:

Sep. 25, 2019

Semi-supervised reinforcement learning

Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Aaron K. Baughman, Cary, NC (US);

Stephen C. Hammer, Marietta, GA (US);

Gray Cannon, Miami, FL (US);

Shikhar Kwatra, Durham, NC (US);

Assignee:

International Business Machines Corporation, Armonk, NY (US);

Attorney:

Eric W. Chesley

Primary Examiner:

Michael S Osinski

Int. Cl.

CPC ...

G06N 3/047 (2023.01); G06F 17/18 (2006.01); G06N 20/00 (2019.01); G10L 15/16 (2006.01); G06N 3/048 (2023.01);

U.S. Cl.

CPC ...

G06N 3/047 (2023.01); G06F 17/18 (2013.01); G06N 3/048 (2023.01); G06N 20/00 (2019.01); G10L 15/16 (2013.01);

Abstract

Provided is a method, a system, and a program product for determining a policy using semi-supervised reinforcement learning. The method includes observing a state of an environment by a learning agent. The method also includes taking an action by the learning agent. The method further includes observing a new state of the environment and calculating a reward for the action taken by the learning agent. The method also includes determining whether a policy related to the learning agent should be changed. The determination is conducted by a teaching agent that inputs the state of the environment and the reward as features. The method can also include changing the policy related to the learning agent upon a determination that a label outputted by the teaching agent exceeds a reward threshold.

Find Patent Forward Citations