The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 03, 1995

Filed:

May. 16, 1994
Applicant:
Inventors:

Dan S Bloomberg, Palo Alto, CA (US);

John W Tukey, Princeton, NJ (US);

M Margaret Withgott, Los Altos, CA (US);

Assignee:

Xerox Corporation, Stamford, CT (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06K / ;
U.S. Cl.
CPC ...
382173 ; 382217 ; 382258 ; 382276 ;
Abstract

A method and apparatus detects function words in a first image of a scanned document without first converting the image to character codes. Function words include determiners, prepositions, articles, and other words that play a largely grammatical role, as opposed to words such as nouns and verbs that convey topic information. Non-content based morphological characteristics of image units are predetermined as well as the presence or omission of character ascenders and descenders in image units. Predetermined characteristics of function word image units are compared with the image units of an image and when a match occurs, the image unit is identified as a function word. Conversely when no matching characteristics occur, the image unit is identified as a non-function word. Additionally, image units are classified and identified as containing only upper case characters, only lower case characters, only digits, and mixed character types.


Find Patent Forward Citations

Loading…