The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 14, 1995

Filed:

Nov. 19, 1991
Applicant:
Inventors:

M Margaret Withgott, Los Altos, CA (US);

Steven C Bagley, Palo Alto, CA (US);

Dan S Bloomberg, Palo Alto, CA (US);

Daniel P Huttenlocher, Ithaca, NY (US);

Ronald M Kaplan, Palo Alto, CA (US);

Todd A Cass, Cambridge, MA (US);

Per-Kristian Halvorsen, Los Altos, CA (US);

Ramana B Rao, San Francisco, CA (US);

Douglass R Cutting, Menlo Park, CA (US);

Assignee:

Xerox Corporation, Rochester, NY (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06K / ;
U.S. Cl.
CPC ...
382-9 ; 382 18 ; 382 40 ;
Abstract

A method and apparatus for processing a document image, using a programmed general or special purpose computer, includes forming the image into image units, and at least one image unit classifier of at least one of the image units is determined, without decoding the content of the at least one of the image units. The classifier of the at least one of the image units is then compared with a classifier of another image unit. The classifier may be image unit length, width, location in the document, font, typeface, cross-section, the number of ascenders, the number of descenders, the average pixel density, the length of the top line contour, the length of the base contour, the location of image units with respect to neighboring image units, vertical position, horizontal inter-image unit spacing, and so forth. The classifier comparison can be a comparison with classifiers of image units of words in a reference table, or with classifiers of other image units in the document. Equivalent classes of image units can be generated, from which word frequency and significance can be determined. The image units can be determined by creating bounding boxes about identifiable segments or extractable units of the image, and can contain a word, a phrase, a letter, a number, a character, a glyph or the like.


Find Patent Forward Citations

Loading…