The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 10839302 B1

Date of Patent:

Nov. 17, 2020

Filed:

Nov. 22, 2016

Approximate value iteration with complex returns by bounding

Applicant:

The Research Foundation for the State University of New York, Binghamton, NY (US);

Inventors:

Robert Wright, Sherrill, NY (US);

Lei Yu, Vestal, NY (US);

Steven Loscalzo, Vienna, VA (US);

Assignee:

The Research Foundation for the State University of New York, Binghamton, NY (US);

Attorneys:

Hoffberg & Associates

Steven M. Hoffberg

Primary Examiner:

Kamran Afshar

Assistant Examiner:

Ying Yu Chen

Int. Cl.

CPC ...

G06N 7/00 (2006.01); G05B 15/02 (2006.01); G06N 20/00 (2019.01); G05B 13/02 (2006.01);

U.S. Cl.

CPC ...

G06N 7/005 (2013.01); G05B 13/0265 (2013.01); G05B 15/02 (2013.01); G06N 20/00 (2019.01); Y02B 10/30 (2013.01);

Abstract

A control system and method for controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system; and which iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states; using a complex return of the data set to determine a bound to improve the estimated long term value; and producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The control system may produce an output signal to control the system directly, or output the optimized control policy. The system preferably is a reinforcement learning system which continually improves.

Find Patent Forward Citations