The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
May. 27, 2003
Filed:
Feb. 11, 2000
Frank J. Oles, Peekskill, NY (US);
Tong Zhang, Yonkers, NY (US);
International Business Machines Corporation, Armonk, NY (US);
Abstract
A method to automatically categorize messages or documents containing text. The method of solution fits in the general framework of supervised learning, in which a rule or rules for categorizing data is automatically constructed by a computer on the basis of training data that has been labeled beforehand. More specifically, the method involves the construction of a linear separator: training data is used to construct for each category a weight vector w and a threshold t, and the decision of whether a hitherto unseen document d is in the category will depend on the outcome of the test w x≧t, where x is a vector derived from the document d. The method also uses a set L of features selected from the training data in order to construct the numerical vector representation x of a document. The preferred method uses an algorithm based on Gauss-Seidel iteration to determine the weight factor w that is determined by a regularized convex optimization problem derived from the principle of minimizing modified training error.