The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 01, 2006

Filed:

May. 17, 2002
Applicants:

Christina Yip Chung, Sunnyvale, CA (US);

Jinhui Liu, Sunnyvale, CA (US);

Alpha Luk, San Jose, CA (US);

Jianchang Mao, San Jose, CA (US);

Sumit Taank, Austin, TX (US);

Vamsi Vutukuru, Austin, TX (US);

Inventors:

Christina Yip Chung, Sunnyvale, CA (US);

Jinhui Liu, Sunnyvale, CA (US);

Alpha Luk, San Jose, CA (US);

Jianchang Mao, San Jose, CA (US);

Sumit Taank, Austin, TX (US);

Vamsi Vutukuru, Austin, TX (US);

Assignee:

Verity, Inc, Sunnyvale, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
L06F 17/30 (2006.01);
U.S. Cl.
CPC ...
Abstract

The invention is a method, system and computer program for automatically discovering concepts from a corpus of documents and automatically generating a labeled concept hierarchy. The method involves extraction of signatures from the corpus of documents. The similarity between signatures is computed using a statistical measure. The frequency distribution of signatures is refined to alleviate any inaccuracy in the similarity measure. The signatures are also disambiguated to address the polysemy problem. The similarity measure is recomputed based on the refined frequency distribution and disambiguated signatures. The recomputed similarity measure reflects actual similarity between signatures. The recomputed similarity measure is then used for clustering related signatures. The signatures are clustered to generate concepts and concepts are arranged in a concept hierarchy. The concept hierarchy automatically generates query for a particular concept and retrieves relevant documents associated with the concept.


Find Patent Forward Citations

Loading…