The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 6678415 B1

Date of Patent:

Jan. 13, 2004

Filed:

May. 12, 2000

Document image decoding using an integrated stochastic language model

Applicant:

Inventors:

Ashok C. Popat, San Carlos, CA (US);

Dan S. Bloomberg, Palo Alto, CA (US);

Daniel H. Greene, Sunnyvale, CA (US);

Assignee:

Xerox Corporation, Stamford, CT (US);

Attorney:

Nola M. McBain

Primary Examiner:

Jon Chang

Assistant Examiner:

Brian Le

Int. Cl.

CPC ...

G06K 9/62 ; G06K 9/72 ;

U.S. Cl.

CPC ...

G06K 9/62 ; G06K 9/72 ;

Abstract

A text recognition system represents the decoded message of a document image as a path through an image network. A method for integrating a language model into the network selectively expands the network to accommodate the language model only for certain ones of the paths in the network, effectively managing the memory storage requirements and computational complexities of integrating the language model efficiently into the network. The language model generates probability distributions indicating the probability of a certain character occurring in a string, given one or more previous characters in the string. Selectively expanding the image network is achieved by initially using upper bounds on the language model probabilities on the branches of an unexpanded image network. A best path search operation is then performed to determine an estimated best path through the image network using these upper bound scores. After decoding, only the nodes on the estimated best path are expanded with new nodes and with branches incoming to the new nodes that accommodate new language model scores reflecting actual character histories in place of the upper bound scores. Decoding and selectively expanding the image network are repeated until the final output transcription of the text image has been produced.

Find Patent Forward Citations