The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 14, 2023

Filed:

May. 18, 2020
Applicant:

Thomson Reuters Enterprise Centre Gmbh, Zug, CH;

Inventors:

Khaled Ammar, Waterloo, CA;

Brian Zubert, Waterloo, CA;

Sakif Hossain Khan, Montreal, CA;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06V 30/414 (2022.01); G06K 9/62 (2022.01); G06V 10/75 (2022.01); G06V 30/416 (2022.01);
U.S. Cl.
CPC ...
G06V 30/414 (2022.01); G06K 9/6256 (2013.01); G06V 10/751 (2022.01); G06V 30/416 (2022.01);
Abstract

In some aspects, a method includes performing optical character recognition (OCR) based on data corresponding to a document to generate text data, detecting one or more bounded regions from the data based on a predetermined boundary rule set, and matching one or more portions of the text data to the one or more bounded regions to generate matched text data. Each bounded region of the one or more bounded regions encloses a corresponding block of text. The method also includes extracting features from the matched text data to generate a plurality of feature vectors and providing the plurality of feature vectors to a trained machine-learning classifier to generate one or more labels associated with the one or more bounded regions. The method further includes outputting metadata indicating a hierarchical layout associated with the document based on the one or more labels and the matched text data.


Find Patent Forward Citations

Loading…