The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 10489682 B1

Date of Patent:

Nov. 26, 2019

Filed:

Dec. 21, 2017

Optical character recognition employing deep learning with machine generated training data

Applicant:

Automation Anywhere Inc., San Jose, CA (US);

Inventors:

Nishit Kumar, San Jose, CA (US);

Thomas Corcoran, San Jose, CA (US);

Bruno Selva, Mountain View, CA (US);

Derek S Chan, San Jose, CA (US);

Abhijit Kakhandiki, San Jose, CA (US);

Assignee:

Automation Anywhere, Inc., San Jose, CA (US);

Attorney:

Prasad IP, PC

Primary Examiner:

Ali Bayat

Int. Cl.

CPC ...

G06K 9/62 (2006.01); G06N 3/08 (2006.01); G06N 3/04 (2006.01); G06K 9/00 (2006.01); G06K 9/18 (2006.01);

U.S. Cl.

CPC ...

G06K 9/6256 (2013.01); G06K 9/00463 (2013.01); G06K 9/00469 (2013.01); G06K 9/18 (2013.01); G06K 9/6254 (2013.01); G06K 9/6298 (2013.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01);

Abstract

An optical character recognition system employs a deep learning system that is trained to process a plurality of images within a particular domain to identify images representing text within each image and to convert the images representing text to textually encoded data. The deep learning system is trained with training data generated from a corpus of real-life text segments that are generated by a plurality of OCR modules. Each of the OCR modules produces a real-life image/text tuple, and at least some of the OCR modules produce a confidence value corresponding to each real-life image/text tuple. Each OCR module is characterized by a conversion accuracy substantially below a desired accuracy for an identified domain. Synthetically generated text segments are produced by programmatically converting text strings to a corresponding image where each text string and corresponding image form a synthetic image/text tuple.

Find Patent Forward Citations