For the Inventor, By the Inventor

The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12399957 B1

Date of Patent:

Aug. 26, 2025

Filed:

Dec. 06, 2021

Reinforcement learning simulation of supply chain graph

Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Peeyush Kumar, Seattle, WA (US);

Hui Qing Li, Seattle, WA (US);

Vaishnavi Nattar Ranganathan, Woodinville, WA (US);

Lillian Jane Ratliff, Seattle, WA (US);

Ranveer Chandra, Kirkland, WA (US);

Vishal Jain, Bengaluru, IN;

Michael Mcnab Bassani, Seattle, WA (US);

Jeremy Randall Reynolds, Boulder, CO (US);

Assignee:

Microsoft Technology Licensing, LLC, Redmond, WA (US);

Attorney:

Alleman Hall & Tuttle LLP

Primary Examiner:

A Hunter Wilder

Int. Cl.

CPC ...

G06F 18/20 (2023.01); G06F 16/90 (2019.01); G06F 18/21 (2023.01); G06F 18/214 (2023.01); G06Q 10/08 (2024.01);

U.S. Cl.

CPC ...

G06F 18/295 (2023.01); G06F 18/2148 (2023.01); G06F 18/2185 (2023.01); G06Q 10/08 (2013.01); G06F 16/90 (2019.01);

Abstract

A computing system including a processor configured to receive training data including, for each of a plurality of training timesteps, training forecast states associated with respective training-phase agents included in a training supply chain graph. The processor may train a reinforcement learning simulation of the training supply chain graph using the training data via policy gradient reinforcement learning. At each training timestep, the training forecast states may be shared between the training-phase agents during training. The processor may receive runtime forecast states associated with respective runtime agents included in a runtime supply chain graph. For a runtime agent, at the trained reinforcement learning simulation, the processor may generate a respective runtime action output associated with a corresponding runtime forecast state of the runtime agent based at least in part on the runtime forecast states. The processor may output the runtime action output.

Find Patent Forward Citations

Loading…