The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 30, 2023

Filed:

Apr. 27, 2020
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Eric Philip Traut, Snoqualmie, WA (US);

Marcos de Moura Campos, Encinitas, CA (US);

Xuan Zhao, Fremont, CA (US);

Ross Ian Story, Oakland, CA (US);

Victor Shnayder, Berkeley, CA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 9/44 (2018.01); G06F 9/455 (2018.01); G06F 9/445 (2018.01); G06N 20/00 (2019.01); G06F 11/34 (2006.01); G06F 11/36 (2006.01); G06N 10/70 (2022.01); G06N 3/08 (2023.01);
U.S. Cl.
CPC ...
G06N 20/00 (2019.01); G06F 11/3466 (2013.01); G06F 11/3664 (2013.01); G06N 3/08 (2013.01); G06N 10/70 (2022.01);
Abstract

A method of training a reinforcement machine learning computer system. The method comprises providing a machine-learning computer programming language including a pre-defined plurality of reinforcement machine learning criterion statements, and receiving a training specification authored in the machine-learning computer programming language. The training specification defines a plurality of training sub-goals with a corresponding plurality of the reinforcement machine learning criterion statements supported by the machine-learning computer programming language. The method further comprises computer translating the plurality of training sub-goals from the training specification into a shaped reward function configured to score a reinforcement machine learning model configuration with regard to the plurality of training sub-goals. The method further comprises running a training experiment with the reinforcement machine learning model configuration, scoring the reinforcement machine learning model in the training experiment with the shaped reward function, and adjusting the reinforcement machine learning model configuration based on the shaped reward function.


Find Patent Forward Citations

Loading…