The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Sep. 17, 2024
Filed:
Mar. 26, 2021
Tata Consultancy Services Limited, Mumbai, IN;
Avinash Achar, Chennai, IN;
Easwara Subramanian, Hyderabad, IN;
Sanjay Purushottam Bhat, Hyderabad, IN;
Vignesh Lakshmanan Kangadharan Palaniradja, Chennai, IN;
Tata Consultancy Services Limited, Mumbai, IN;
Abstract
This disclosure relates to method and system for optimal policy learning and recommendation for distribution task using deep RL model, in applications where when the action space has a probability simplex structure. The method includes training a RL agent by defining a policy network for learning the optimal policy using a policy gradient (PG) method, where the policy network comprising an artificial neural network (ANN) with a set of outputs. A continuous action space having a continuous probability simplex structure is defined. The learning of the optimal policy is updated based on one of stochastic and deterministic PG. For stochastic PG, a Dirichlet distribution based stochastic policy parameterized by output of the ANN with an activation function at an output layer of the ANN is selected. For deterministic PG, a soft-max function is selected as activation function at the output layer of the ANN to maintain the probability simplex structure.