The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 11, 2000

Filed:

Jul. 15, 1997
Applicant:
Inventors:

Colin Leonard Bird, Eastleigh, GB;

Graham Derek Wallis, West Wellow, GB;

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06F / ;
U.S. Cl.
CPC ...
712 28 ; 712 21 ;
Abstract

A method is disclosed for reproducible sampling of data items of a dataset which is shared across a plurality of nodes of a parallel data processing system. In data mining of large databases, segmentation of the database is often necessary either to obtain a summary of the database or prior to an operation such as link analysis. A sample of data records are taken to create an initial segmentation model. The records of this sample and the initial model created from them can be critical to the results of the data mining process, and the initial model may not be reproducible unless the same sampling of data records is repeatable. Reproducible sampling is enabled without polling of all nodes to locate particular records. Parametric control information with a small number of control parameters is generated which describes the particular partitioning of the dataset. The parametric control information enables computing of the location of a data record. The parametric control information may be distributed to each node and enable computing of the location of data records by each node. The invention is applicable to other sampling methods.


Find Patent Forward Citations

Loading…