The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 29, 2002

Filed:

Dec. 18, 1998
Applicant:
Inventors:

Sanjeev Katariya, Issaquah, WA (US);

William P. Jones, Kirkland, WA (US);

Assignee:

Microsoft Corporation, Redmond, WA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 1/730 ; G06F 1/728 ; G06F 1/721 ;
U.S. Cl.
CPC ...
G06F 1/730 ; G06F 1/728 ; G06F 1/721 ;
Abstract

A weighting system for calculating the term-document importance for each term within each document that is part of a collection of documents (i.e., a corpus). The weighting system calculates the importance of a term within a document based on a computed normalized term frequency and a computed inverse document frequency. The computed normalized term frequency is a function, referred to as the “computed term frequency function” (“A”), of a normalized term frequency. The normalized term frequency is the term frequency, which is the number of times that the term occurs in the document, normalized by the total term frequency of the term within all documents, which is the total number of times that the term occurs in all the documents. The weighting system normalizes the term frequency by dividing the term frequency by a function, referred to as the “normalizing term frequency function” (“&Ggr;”), of the total term frequency. The computed inverse document frequency is a function, referred to as the “computed inverse document frequency function” (“B”) of the inverse document frequency. The weighting system identifies a computed normalized term frequency function A and a computed inverse document frequency function B so that on average the computed normalized term frequency and the computed inverse document frequency contribute equally to the weight of the terms.


Find Patent Forward Citations

Loading…