The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 07, 2023

Filed:

Sep. 13, 2018
Applicants:

Fujitsu Limited, Kawasaki, JP;

Okinawa Institute of Science and Technology School Corporation, Okinawa, JP;

Inventors:

Tomotake Sasaki, Kawasaki, JP;

Eiji Uchibe, Kunigami, JP;

Kenji Doya, Kunigami, JP;

Hirokazu Anai, Hachioji, JP;

Hitoshi Yanami, Kawasaki, JP;

Hidenao Iwane, Kawasaki, JP;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G05B 13/02 (2006.01); G06F 17/16 (2006.01); G06N 20/00 (2019.01); G06N 3/00 (2006.01); G05B 13/04 (2006.01);
U.S. Cl.
CPC ...
G05B 13/025 (2013.01); G05B 13/022 (2013.01); G05B 13/0265 (2013.01); G05B 13/042 (2013.01); G05B 13/048 (2013.01); G06F 17/16 (2013.01); G06N 3/006 (2013.01); G06N 20/00 (2019.01);
Abstract

A non-transitory, computer-readable recording medium stores a program of reinforcement learning by a state-value function. The program causes a computer to execute a process including calculating a temporal difference (TD) error based on an estimated state-value function, the TD error being calculated by giving a perturbation to each component of a feedback coefficient matrix that provides a policy; calculating based on the TD error and the perturbation, an estimated gradient function matrix acquired by estimating a gradient function matrix of the state-value function with respect to the feedback coefficient matrix for a state of a controlled object, when state variation of the controlled object in the reinforcement learning is described by a linear difference equation and an immediate cost or an immediate reward of the controlled object is described in a quadratic form of the state and an input; and updating the feedback coefficient matrix using the estimated gradient function matrix.


Find Patent Forward Citations

Loading…