The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Dec. 03, 2002

Filed:

Feb. 08, 2000
Applicant:
Inventors:

Usama M. Fayyad, Mercer Island, WA (US);

Cheng Yang, Mountain View, CA (US);

Paul S. Bradley, Seattle, WA (US);

Assignee:

Microsoft Corporation, Redmond, WA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 1/730 ;
U.S. Cl.
CPC ...
G06F 1/730 ;
Abstract

Iterative validation for efficiently determining error-tolerant frequent itemsets is disclosed. A description of the application of error-tolerant frequent itemsets to efficiently determining clusters as well as initializing clustering algorithms are also given. In one embodiment, a method determines a sample set of error-tolerant frequent itemsets (ETF's) within a uniform random sample of data within a database. This sample set of ETF's is independently validated, so that, for example, spurious ETF's and spurious dimensions within the ETF's can be removed. The validated sample set of ETF's, is added to the set of ETF's for the database. This process is repeated with additional uniform samples that are mutually exclusive from prior uniform samples, to continue building the database's set of ETF's, until no new sample sets can be found. The method is significantly more efficient than disk-based methods in the prior art, and the data clusters found are often not discovered by traditional clustering algorithm in the prior art.


Find Patent Forward Citations

Loading…