The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 14, 2003

Filed:

Jul. 28, 2000
Applicant:
Inventors:

Shermann Loyall Min, Pacifica, CA (US);

Constantin Lorenzo Tanno, San Francisco, CA (US);

Zachary Frank Mainen, Cold Spring Harbor, NY (US);

William Russell Softky, Menlo Park, CA (US);

Assignee:

Other;

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 1/730 ;
U.S. Cl.
CPC ...
G06F 1/730 ;
Abstract

A system and method for document retrieval is disclosed. The invention addresses a major problem in text-based document retrieval: rapidly finding a small subset of documents in a large document collection (e.g. Web pages on the Internet) that are relevant to a limited set of query terms supplied by the user. The invention is based on utilizing information contained in the document collection about the statistics of word relationships (“context”) to facilitate the specification of search queries and document comparison. The method consists of first compiling word relationships into a context database that captures the statistics of word proximity and occurrence throughout the document collection. At retrieval time, a search matrix is computed from a set of user-supplied keywords and the context database. For each document in the collection, a similar matrix is computed using the contents of the document and the context database. Document relevance is determined by comparing the similarity of the search and document matrices. The disclosed system therefore retrieves documents with contextual similarity rather than word frequency similarity, simplifying search specification while allowing greater search precision.


Find Patent Forward Citations

Loading…