The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jan. 26, 2021

Filed:

Apr. 11, 2017
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Akira Koseki, Tokyo, JP;

Tetsuro Morimura, Tokyo, JP;

Toshiro Takase, Tokyo, JP;

Hiroki Yanagisawa, Tokyo, JP;

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06N 5/04 (2006.01); G06N 7/00 (2006.01); G06N 20/00 (2019.01); B60W 30/18 (2012.01); B60W 10/18 (2012.01); B60W 10/20 (2006.01); G06F 17/16 (2006.01); G06N 5/02 (2006.01); G05D 1/00 (2006.01);
U.S. Cl.
CPC ...
G06N 20/00 (2019.01); B60W 10/18 (2013.01); B60W 10/20 (2013.01); B60W 30/18 (2013.01); G06F 17/16 (2013.01); G06N 5/025 (2013.01); G06N 5/04 (2013.01); G06N 5/043 (2013.01); G06N 7/005 (2013.01); B60W 2710/18 (2013.01); B60W 2710/20 (2013.01); G05D 1/0088 (2013.01);
Abstract

A method is provided for rule creation that includes receiving (i) a MDP model with a set of states, a set of actions, and a set of transition probabilities, (ii) a policy that corresponds to rules for a rule engine, and (iii) a set of candidate states that can be added to the set of states. The method includes transforming the MDP model to include a reward function using an inverse reinforcement learning process on the MDP model and on the policy. The method includes finding a state from the candidate states, and generating a refined MDP model with the reward function by updating the transition probabilities related to the state. The method includes obtaining an optimal policy for the refined MDP model with the reward function, based on the reward policy, the state, and the updated probabilities. The method includes updating the rule engine based on the optimal policy.


Find Patent Forward Citations

Loading…