The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 03, 2011

Filed:

Dec. 01, 2004
Applicants:

Jill Carrier, Dorchester, MA (US);

Alwin B. Carus, Waban, MA (US);

William F. Cote, Carlisle, MA (US);

John Dowd, Sudbury, MA (US);

Kathryn Del LA Femina, Ashland, MA (US);

Alan Frankel, Framingham, MA (US);

Wensheng(vincent) Han, Arlington, MA (US);

Larissa Lapshina, Shirley, MA (US);

Bernardo Rechea, Belmont, MA (US);

Ana Santisteban, Somerville, MA (US);

Amy J. Uhrbach, Needham, MA (US);

Inventors:

Jill Carrier, Dorchester, MA (US);

Alwin B. Carus, Waban, MA (US);

William F. Cote, Carlisle, MA (US);

John Dowd, Sudbury, MA (US);

Kathryn Del La Femina, Ashland, MA (US);

Alan Frankel, Framingham, MA (US);

Wensheng(Vincent) Han, Arlington, MA (US);

Larissa Lapshina, Shirley, MA (US);

Bernardo Rechea, Belmont, MA (US);

Ana Santisteban, Somerville, MA (US);

Amy J. Uhrbach, Needham, MA (US);

Assignee:

Dictaphone Corporation, Stratford, CT (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/27 (2006.01); G06F 17/20 (2006.01);
U.S. Cl.
CPC ...
Abstract

The present invention pertains to a system and method for the tokenization of text. The featurizer may be configured to receive input text and convert the input text into tokens. According to one aspect of the invention, the tokens may include only one type of character, the characters selected from the group consisting of letters, numbers, and punctuation. The tokenizer may also include a classifier. The classifier may be configured to receive the tokens from the featurizer. Furthermore, the classifier may be configured to analyze the tokens received from the featurizer to determine if the tokens may be input into a predetermined classification model using a preclassifier. If one of the tokens passes the preclassifier, then the token is classified using the predetermined classification model. Additionally, according to a first aspect of the invention, the tokenizer may also include a finalizer. The finalizer may be configured to receive the tokens and may be configured to produce a final output.


Find Patent Forward Citations

Loading…