The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Feb. 24, 1998
Filed:
Aug. 03, 1995
Ronald M Kaplan, Palo Alto, CA (US);
Xerox Corporation, Stamford, CT (US);
Abstract
An efficient method and apparatus for tokenizing natural language text minimizes required data storage and produces guaranteed incremental output. Id (text) is composed with a tokenizer to create a finite state machine representing tokenization paths. The tokenizer itself is in the form of a finite state transducer. The process is carried out in a breadth-first manner so that all possibilities are explored at each character position before progressing. Output is produced incrementally and occurs only when all paths collapse into one. Output may be delayed until a token boundary is reached. In this manner, the output is guaranteed and will not be retracted unless the text is globally ill-formed. Each time output is produced, storage space is freed for subsequent text processing.