The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 24, 2004

Filed:

Jun. 12, 2000
Applicant:
Inventors:

Frederick J. Damerau, North Salem, NY (US);

David E. Johnson, Cortlandt Manor, NY (US);

Martin C. Buskirk, Jr., Raleigh, NC (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 1/721 ;
U.S. Cl.
CPC ...
G06F 1/721 ;
Abstract

A method of automatically labeling of unlabeled text data can be practiced independent of human intervention, but that does not preclude manual intervention. The method can be used to extract relevant features of unlabeled text data for a keyword search. The method of automated labeling of unlabeled text data uses a document collection as a reference answer set. Members of the answer set are converted to vectors representing centroids of unknown groups of unlabeled text data. Unlabeled text data are clustered relative to the centroids by a nearest neighbor algorithm and the ID of the relevant answer is assigned to all documents in the cluster. At this point in the process, a supervised machine learning algorithm is trained on labeled data, and a classifier for assigning labels to new text data is output. Alternatively, a feature extraction algorithm may be run on classes generated by the step of clustering, and search features output which index the unlabeled text data.


Find Patent Forward Citations

Loading…