The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 12, 2022

Filed:

Mar. 05, 2020
Applicant:

Fujitsu Limited, Kawasaki, JP;

Inventor:

Tomotake Sasaki, Kawasaki, JP;

Assignee:

FUJITSU LIMITED, Kawasaki, JP;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G05B 13/02 (2006.01); G06N 20/00 (2019.01); G06N 5/02 (2006.01); G06F 17/12 (2006.01); F24F 11/63 (2018.01); B25J 9/16 (2006.01); H02J 3/38 (2006.01);
U.S. Cl.
CPC ...
G05B 13/0265 (2013.01); B25J 9/1658 (2013.01); F24F 11/63 (2018.01); G06F 17/12 (2013.01); G06N 5/022 (2013.01); G06N 20/00 (2019.01); H02J 3/381 (2013.01); H02J 2300/28 (2020.01);
Abstract

A policy improvement method of improving a policy of reinforcement learning by a state value function, is executed by a computer and includes adding a plurality of perturbations to a plurality of components of a first parameter of the policy; estimating a gradient function of the state value function with respect to the first parameter, based on a result of an input determination performed for a control target in the reinforcement learning, the input determination being performed by using the policy that uses a second parameter obtained by adding the plurality of perturbations to the plurality of components; and updating the first parameter based on the estimated gradient function.


Find Patent Forward Citations

Loading…