The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 8521678 B1

Date of Patent:

Aug. 27, 2013

Filed:

Jun. 03, 2010

Learning control system and learning control method

Applicants:

Chyon Hae Kim, Wako, JP;

Hiroshi Tsujino, Wako, JP;

Hiroyuki Nakahara, Wako, JP;

Inventors:

Chyon Hae Kim, Wako, JP;

Hiroshi Tsujino, Wako, JP;

Hiroyuki Nakahara, Wako, JP;

Assignee:

Honda Motor Co., Ltd., Tokyo, JP;

Attorney:

Squire Sanders (US) LLP

Primary Examiner:

Kakali Clark

Assistant Examiner:

Ababacar Seck

Int. Cl.

CPC ...

G06F 17/00 (2006.01); G06N 5/02 (2006.01);

U.S. Cl.

CPC ...

Abstract

A learning control system according to the present invention is one which performs learning of action values of actions in an apparatus which identifies its state as one of predetermined states, and selects an action based on the obtained action values and the identified state. The learning control system includes n action value learning devices including the first to the n th learning devices which perform learning of n action values from Qto Q, assuming that n is a positive integer and an action value determining device which determines the total action value of an action Q of each state based on outputs of the n action value learning devices. In the learning control system, the first target value of the first action value learning device is determined based on the reward r obtained after an action has been carried out by the next state and a total action value Q' that was prepared for the action selection in the next state, and the first learning device updates the first action value Qusing the first target value. When n is 2 or more, the n-th a target value of the n th action value learning device is set to the difference between the (n−1) th target value of the (n−1) th learning device and the action value Q, and the n th learning device updates the n th action value Qusing the n th target value.

Find Patent Forward Citations