The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 06, 1998

Filed:

Mar. 07, 1997
Applicant:
Inventors:

Shivakumar Vaithyanathan, San Jose, CA (US);

Robert Travis, Concord, MA (US);

Mayank Prakash, Acton, MA (US);

Assignee:

Digital Equipment Corporation, Maynard, MA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F / ;
U.S. Cl.
CPC ...
707-2 ; 707-5 ; 707-7 ; 707-8 ;
Abstract

A top-down clustering method and apparatus recursively processes clusters of documents by first extracting features from the documents comprising the cluster, then using the extracted features to generate sub-clusters and finally using the generated sub-clusters to develop topics and identifiers for each sub-cluster. This process is repeated for each cluster and sub-cluster in a recursive manner so that clustering is performed using features extracted from each document in a cluster to perform sub-clustering. Feature extraction is performed by using frequency counts of terms taken from each document in a cluster and discarding terms falling outside of predetermined boundaries computed based on the total number of documents in the cluster. After bounding, the number of tokens is reduced prior to clustering by means of a correlation technique, such as a PCA model.


Find Patent Forward Citations

Loading…