The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Dec. 04, 2001

Filed:

Jul. 07, 1999
Applicant:
Inventors:

Maria E. Smith, Plantation, FL (US);

Bernard John Grainger, Winchester, GB;

Hubert Crépy, Boulogne, FR;

Martin Herzog, Griesheim, DE;

Gerhard Backfried, Purkersdorf, AT;

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 1/727 ; G10L 1/518 ;
U.S. Cl.
CPC ...
G06F 1/727 ; G10L 1/518 ;
Abstract

A method for supporting customized tokenization of domain-specific text acomprises the steps of: loading domain-specific tokenization rules corresponding to the customized tokenization of the domain-specific text; tokenizing the domain-specific text using the loaded domain-specific tokenization rules; and, further tokenizing the domain-specific text using general purpose tokenization rules. The loading step of the inventive method can comprise: loading a speech recognition vocabulary; and, loading domain-specific tokenization rules corresponding to the speech recognition vocabulary. In addition, the tokenizing step can comprise identifying each substring in the domain-specific text matching a regular expression having a corresponding replacement pattern in the loaded domain-specific tokenization rules, and replacing each substring identified in the identifying step with the replacement pattern corresponding to the matched regular expression. Alternatively, the tokenizing step can comprise identifying substrings in the domain-specific text matching a regular expression having a corresponding replacement pattern in the second loaded domain-specific tokenization rules; excluding from further processing the identified substrings having a do-not-replace marker associated with the identified substring; and, replacing each non-excluded identified substring with the replacement pattern corresponding to the matched regular expression.


Find Patent Forward Citations

Loading…