The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 10, 2022

Filed:

Jul. 08, 2019
Applicant:

Uipath Inc., New York, NY (US);

Inventors:

Horia Cristescu, Bucharest, RO;

Stefan A. Adam, Bucharest, RO;

Mircea Neagovici, Bellevue, WA (US);

Assignee:

UiPath Inc., New York, NY (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06K 9/00 (2022.01); G06F 16/56 (2019.01); G06F 16/583 (2019.01); G06F 17/27 (2006.01); G06K 9/46 (2006.01); G06V 30/414 (2022.01); G06F 40/284 (2020.01); G06V 10/40 (2022.01); G06V 30/413 (2022.01); G06V 30/416 (2022.01);
U.S. Cl.
CPC ...
G06V 30/414 (2022.01); G06F 16/56 (2019.01); G06F 16/5846 (2019.01); G06F 40/284 (2020.01); G06V 10/40 (2022.01); G06V 30/413 (2022.01); G06V 30/416 (2022.01);
Abstract

Described systems and methods allow the automatic extraction of structured information from images of structured text documents such as invoices and receipts. Some embodiments employ optical character recognition (OCR) technology to extract individual text tokens (e.g., words) and token bounding boxes from a document image. A feature vector of each text token comprises a first part determined according to a character content of the text token, and a second part determined according to an image content of the token's bounding box. A neural network classifier produces a label indicative of a type of information (e.g. 'billing address', 'due date', etc.) carried by each text token. In some embodiments, documents are linearized by ordering text tokens in a sequence according to a reading order of a natural language (e.g., English, Arabic) in which the respective document is formulated. Token feature vectors are fed to the classifier in the order indicated by the token sequence.


Find Patent Forward Citations

Loading…