The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Nov. 03, 2020

Filed:

Dec. 27, 2018
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Yan Wang, Mercer Island, WA (US);

Arun Sacheti, Sammamish, WA (US);

Vishal Chhabilbhai Thakkar, Kirkland, WA (US);

Surendra Srinivas Ulabala, Bothell, WA (US);

Shloak Jain, Redmond, WA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06K 9/46 (2006.01); G06T 11/20 (2006.01);
U.S. Cl.
CPC ...
G06K 9/46 (2013.01); G06T 11/20 (2013.01); G06K 2209/01 (2013.01); G06T 2210/12 (2013.01);
Abstract

Representative embodiments disclose mechanisms to create a text stream from raw OCR outputs. The raw OCR output comprises a plurality of bounding boxes, each bounding box defining a region containing text which has been recognized by the OCR system. A weight matrix is calculated that comprises a weight for each pair of bounding boxes. The weight representing the probability that a pair of bounding boxes belongs to the same cluster. The bounding boxes are then clustered along the weights. The resulting clusters are first ordered using an ordering criteria. The bounding boxes within each cluster are then ordered according to a second ordering criteria. The ordered clusters and bounding boxes are then arranged into a text stream.


Find Patent Forward Citations

Loading…