For the Inventor, By the Inventor

The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11880765 B1

Date of Patent:

Jan. 23, 2024

Filed:

Oct. 19, 2020

State-augmented reinforcement learning

Applicants:

International Business Machines Corporation, Armonk, NY (US);

University of Illinois AT Urbana-champaign, Urbana, IL (US);

Inventors:

Pin-Yu Chen, White Plains, NY (US);

Yada Zhu, Westchester, NY (US);

Jinjun Xiong, Goldens Bridge, NY (US);

Kumar Bhaskaran, Englewood Cliffs, NJ (US);

Yunan Ye, Hangzhou, CN;

Bo Li, Champaign, IL (US);

Assignees:

International Business Machines Corporation, Armonk, NY (US);

University of Illinois at Urbana-Champaign, Urbana, IL (US);

Attorneys:

Scully, Scott, Murphy & Presser, P.C.

Daniel P. Morris

Primary Examiner:

Alvin L Brown

Int. Cl.

CPC ...

G06Q 30/00 (2023.01); G06N 3/08 (2023.01); G06F 40/279 (2020.01); G06Q 40/06 (2012.01);

U.S. Cl.

CPC ...

G06N 3/08 (2013.01); G06F 40/279 (2020.01); G06Q 40/06 (2013.01);

Abstract

A processor training a reinforcement learning model can include receiving a first dataset representing an observable state in reinforcement learning to train a machine to perform an action. The processor receives a second dataset. Using the second dataset, the processor trains a machine learning classifier to make a prediction about an entity related to the action. The processor extracts an embedding from the trained machine learning classifier, and augments the observable state with the embedding to create an augmented state. Based on the augmented state, the processor trains a reinforcement learning model to learn a policy for performing the action, the policy including a mapping from state space to action space.

Find Patent Forward Citations

Loading…