The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 25, 1998

Filed:

May. 08, 1996
Applicant:
Inventors:

Rakesh Agrawal, San Jose, CA (US);

Manish Mehta, San Jose, CA (US);

John Christopher Shafer, Amherst, MA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F / ;
U.S. Cl.
CPC ...
707102 ; 3642253 ; 3642254 ; 3642823 ; 364974 ; 364D / ; 364D / ; 711100 ; 382 36 ;
Abstract

A method and system are disclosed for generating a decision-tree classifier from a training set of records, independent of the system memory size. The method comprises the steps of: generating an attribute list for each attribute of the records, sorting the attribute lists for numeric attributes, and generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, split points are evaluated to determine the best split test for partitioning the records at the node. Preferably, a gini index and class histograms are used in determining the best splits. The gini index indicates how well a split point separates the records while the class histograms reflect the class distribution of the records at the node. Also, a hash table is built as the attribute list of the split attribute is divided among the child nodes, which is then used for splitting the remaining attribute lists of the node. The created tree is further pruned based on the MDL principle, which encodes the tree and split tests in an MDL-based code, and determines whether to prune and how to prune each node based on the code length of the node.


Find Patent Forward Citations

Loading…