The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Feb. 11, 2003
Filed:
Jun. 08, 2000
David E. Johnson, Cortlandt Manor, NY (US);
Frank J. Oles, Peekskill, NY (US);
Tong Zhang, Yonkers, NY (US);
International Business Machines Corporation, Armonk, NY (US);
Abstract
A method to automatically categorize messages or documents containing text. The method of solution fits in the general framework of supervised learning, in which a rule or rules for categorizing data is automatically constructed by a computer on the basis of training data that has beforehand been categorized, i.e., each training data item has been labeled with the categories to which it belongs. More specifically, the method for rule induction involves the novel combination of (1) inducing from the training data a decision tree for each category, (2) automated construction from each decision tree of a simplified symbolic rule set that is logically equivalent overall to the decision tree, and which is to be used for categorization instead of the decision tree, and (3) determination of a confidence level for each rule. The method covers both decision-tree-based symbolic rule induction and the use for the purpose of document categorization of rules in the logical format of those generated by the rule induction procedure described herein.