The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Dec. 11, 2012

Filed:

Jan. 21, 2009
Applicants:

Ahmad Abdulkader, San Jose, CA (US);

Matthew R. Casey, San Francisco, CA (US);

Inventors:

Ahmad Abdulkader, San Jose, CA (US);

Matthew R. Casey, San Francisco, CA (US);

Assignee:

Google Inc., Mountain View, CA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06K 9/00 (2006.01); G06K 9/62 (2006.01); G06K 9/34 (2006.01); G06K 9/03 (2006.01); G06K 7/10 (2006.01); G06K 1/00 (2006.01); G06K 15/02 (2006.01); G06F 17/30 (2006.01); G06F 7/00 (2006.01); G06F 17/28 (2006.01); G06F 15/00 (2006.01); H04N 1/40 (2006.01); G06F 17/00 (2006.01); G06F 17/20 (2006.01); G06F 17/21 (2006.01); G06F 17/22 (2006.01); G06F 17/24 (2006.01); G06F 17/25 (2006.01); G06F 17/26 (2006.01); G06F 17/27 (2006.01);
U.S. Cl.
CPC ...
Abstract

OCR errors are identified and corrected through learning. An error probability estimator is trained using ground truths to learn error probability estimation. Multiple OCR engines process a text image, and convert it into texts. The error probability estimator compares the outcomes of the multiple OCR engines for mismatches, and determines an error probability for each of the mismatches. If the error probability of a mismatch exceeds an error probability threshold, a suspect is generated and grouped together with similar suspects in a cluster. A question for the cluster is generated and rendered to a human operator for answering. The answer from the human operator is then applied to all suspects in the cluster to correct OCR errors in the resulting text. The answer is also used to further train the error probability estimator.


Find Patent Forward Citations

Loading…