The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Feb. 17, 2004
Filed:
Aug. 28, 2000
Roland John Burns, Santa Cruz, CA (US);
Thomas Kieninger, Kaiserslautern, DE;
Stefan Klink, Gutweiler, DE;
Hewlett-Packard Development, L.P., Houston, TX (US);
Abstract
The present invention is directed to a method and an apparatus for performing document analysis. The apparatus of the present invention comprises logic configured to recognize and label structures in a document that are both common to multiple types of documents and that are unique to the particular type of document being analyzed. The logic preferably is a computer that receives the output of an optical character recognition (OCR) system and then analyzes the output in accordance with a document structure analysis routine. For structures that are common to multiple types of documents, various types of tests may be performed by the document structure analysis routine to recognize and label the common types of structures. In order to recongize structures that are unique to the particular type of document being analyzed, the document structure analysis routine utilizes a rule base that is adapted to the particular application domain associated with the document. The rule base comprises a plurality of rules for testing structures in the document in order to recognize unique, or application-domain-dependent, structures. These structures are also labeled. All of the labeled structures are assigned a likelihood indicator that is associated with a particular label. The likelihood indicator indicates the likelihood that the label associated with it is correct. The labels and the associated likelihood indicators may then be used to correctly identify the application-domain-dependent structures in the document.