The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 25, 2022

Filed:

Nov. 08, 2019
Applicant:

Oracle International Corporation, Redwood Shores, CA (US);

Inventor:

Sudhakar Kalluri, Cupertino, CA (US);

Assignee:

Oracle International Corporation, Redwood Shores, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 40/284 (2020.01); G06N 20/00 (2019.01);
U.S. Cl.
CPC ...
G06F 40/284 (2020.01); G06N 20/00 (2019.01);
Abstract

Techniques are described herein for training and evaluating machine learning (ML) models for document processing computing applications using generalized vocabulary tokens. In some embodiments, an ML system determines a set of tokens for non-textual content in a plurality of documents. The ML system generates a fixed-length vocabulary that includes the set of tokens for the non-textual content. The ML system further generates for each respective document in a training dataset of documents, a respective feature vector based at least in part on which tokens in the fixed-length vocabulary occur in the respective document. The ML system trains a ML model based at least in part on the respective feature vector for each respective document in the training dataset.


Find Patent Forward Citations

Loading…