The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jan. 16, 2007

Filed:

Jul. 31, 2002
Applicants:

Eric J. Glover, North Brunswick, NJ (US);

Stephen R. Lawrence, New York, NY (US);

David M. Pennock, Monroe Township, NJ (US);

Inventors:

Eric J. Glover, North Brunswick, NJ (US);

Stephen R. Lawrence, New York, NY (US);

David M. Pennock, Monroe Township, NJ (US);

Assignee:

NEC Laboratories America, Inc., Princeton, NJ (US);

Attorney:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/27 (2006.01);
U.S. Cl.
CPC ...
Abstract

A method automatically determines groups of words or phrases that are descriptive names of a small set of documents, as well as infers concepts in the small set of documents that are more general and more specific than the descriptive names, without any prior knowledge of the hierarchy or the concepts, in a language independent manner. The descriptive names and the concepts may not even be explicitly contained in the documents. The primary application of the invention is for searching of the World Wide Web, but the invention is not limited solely to use with the World Wide Web and may be applied to any set of documents. Classes of features are identified in order to promote understanding of a set of documents. Preferably, there are three classes of features. 'Self' features or terms describe the cluster as a whole. 'Parent' features or terms describe more general concepts. “Child” features or terms describe specializations of the cluster. The self features can be used as a recommended name for a cluster, while parents and children can be used to place the clusters in the space of a larger collection. Parent features suggest a more general concept, while children features suggest concepts that describe a specialization of the self feature(s). Automatic discovery of parent, self and child features is useful for several purposes including automatic labeling of web directories and improving information retrieval.


Find Patent Forward Citations

Loading…