The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 04, 2025

Filed:

Jun. 27, 2022
Applicant:

Fmr Llc, Boston, MA (US);

Inventors:

Keerthan Ramnath, Chennai, IN;

Punitha Chandrasekar, Bangalore, IN;

Hui Su, West Roxbury, MA (US);

Shyam Subramanian, Norwood, MA (US);

Rachna Saxena, Bangalore, IN;

Mohamed Mahdi Alouane, Toronto, CA;

Vinay Iyengar, Westwood, MA (US);

Assignee:

FMR LLC, Boston, MA (US);

Attorney:
Int. Cl.
CPC ...
G06V 30/414 (2022.01); G06F 40/232 (2020.01); G06F 40/263 (2020.01); G06F 40/284 (2020.01);
U.S. Cl.
CPC ...
G06V 30/414 (2022.01); G06F 40/232 (2020.01); G06F 40/263 (2020.01); G06F 40/284 (2020.01);
Abstract

Systems and methods for extracting data from electronic documents using optical character recognition (OCR) and non-OCR based text extraction. A server computing device initiates non-OCR based text extraction for each page of an electronic document. The server calculates a document text coverage percentage corresponding to the non-OCR based text extraction for the whole document and, in response to determining that the document text coverage percentage is below a first threshold, initiates OCR for the document. The server calculates a page text coverage percentage corresponding to the non-OCR based text extraction for one or more pages of the electronic document and, in response to determining that the page text coverage percentage is below a second threshold, initiates OCR for the pages. The server combines first text extracted from the electronic document using non-OCR based text extraction and second text extracted from the electronic document using OCR.


Find Patent Forward Citations

Loading…