The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Feb. 03, 2004
Filed:
Jun. 20, 1997
Jesse Hull, Boston, MA (US);
Philip A. Chou, Bellevue, WA (US);
Gary E. Kopec, Belmont, CA (US);
Dennis S. Arnon, San Francisco, CA (US);
Xerox Corporation, Stamford, CT (US);
Abstract
A two-dimensional (2D) image model models the layout structure of a class of document images as an image grammar and includes production rules having explicit layout parameters as data items that indicate information about the spatial relationships among image constituents occurring in images included in the class. The parameters are explicitly represented in the grammar rules in a manner that permits them to be automatically trained by a training operation that makes use of sample document images from the class of modeled documents. After each sample image is aligned with the 2D grammar, document-specific measurements about the spatial relationships between image constituents are taken from the image. Optimal values for the layout parameters are then computed from the measurement data collected from all samples. An illustrated implementation of the 2D image model takes the form of a stochastic context-free attribute grammar in which synthesized and inherited attributes and synthesis and inheritance functions are associated with each production rule in the grammar. The attributes indicate physical spatial locations of image constituents in the image, and a set of parameterized functions, in which the coefficients are the layout parameters, compute the attributes as a function of a characteristic of an image constituent of the production rule. The measurement data is taken from an annotated parse tree produced for each training image by the grammar. A trained grammar can then be used, for example, for document recognition and layout analysis operations on any document in the class of documents modeled by the grammar.