The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jan. 16, 2001
Filed:
Jun. 24, 1998
Patrick Pei Cai, Redmond, WA (US);
Patrick H. Halstead, Bellevue, WA (US);
Microsoft Corporation, Redmond, WA (US);
Abstract
A Consistency Checker provides an improved method of analyzing a Japanese text document to identify inconsistently spelled words. The Consistency Checker utilizes a Reading Pair Database (RPD) and a Compressed Lexicon Database (CLD) to determine the reading units within a word, to calculate a Reading Pair Identification Number (RID) for each reading unit, to calculate a Sense Identification Number (SID) for each word, and to calculate a Spelling Variant Identification Number (SVID) for each word. Spelling variants are generated by combining variations of individual RIDs in the RID array. A Registry is updated to maintain statistics on all of the words within the document. An error field within the Registry indicates that the document contains more than one spelling variant of a particular word. The client program can access the Registry to alert a user to inconsistencies discovered in the document. The RPD comprises a list of reading pairs correlating Japanese text reading units of one character set with equivalent Japanese text reading units of another character set. Equivalent reading units from each character set are combined to form a reading pair and each reading pair is assigned a RID. A method is provided for generating the RPD by analyzing a list of Japanese words and a list of Japanese word equivalents having different spellings. Reading units are discovered by splitting the words at common dividing points and eliminating low-occurrence reading units until a set of high-occurrence reading units is defined.