The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Sep. 26, 2023
Filed:
Aug. 18, 2021
Ushur, Inc., Santa Clara, CA (US);
Yashu Seth, Patna, IN;
Ravil Kashyap, Bangalore, IN;
Shaik Kamran Moinuddin, Bangalore, IN;
Vijayendra Mysore Shamanna, Bangalore, IN;
Henry Thomas Peter, Mountain House, CA (US);
Simha Sadasiva, San Jose, CA (US);
Ushur, Inc., Santa Clara, CA (US);
Abstract
The present disclosure relates to a system and method to extract information from unstructured image documents. The extraction technique is content-driven and not dependent on the layout of a particular image document type. The disclosed method breaks down an image document into smaller images using the text cluster detection algorithm. The smaller images are converted into text samples using optical character recognition (OCR). Each of the text samples is fed to a trained machine learning model. The model classifies each text sample into one of a plurality of pre-determined field types. The desired value extraction problem may be converted into a question-answering problem using a pre-trained model. A fixed question is formed on the basis of the classified field type. The output of the question-answering model may be passed through a rule-based post-processing step to obtain the final answer.