The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 25, 2011

Filed:

Dec. 31, 2007
Applicants:

Anirban Dasgupta, Berkeley, CA (US);

Petros Drineas, Troy, NY (US);

Boulos Harb, Brooklyn, NY (US);

Vanja Josifovski, Los Gatos, CA (US);

Michael William Mahoney, Redwood City, CA (US);

Inventors:

Anirban Dasgupta, Berkeley, CA (US);

Petros Drineas, Troy, NY (US);

Boulos Harb, Brooklyn, NY (US);

Vanja Josifovski, Los Gatos, CA (US);

Michael William Mahoney, Redwood City, CA (US);

Assignee:

Yahoo! Inc., Sunnyvale, CA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06N 5/00 (2006.01);
U.S. Cl.
CPC ...
Abstract

An improved system and method is provided for feature selection for text classification using subspace sampling. A text classifier generator may be provided for selecting a small set of features using subspace sampling from the corpus of training data to train a text classifier for using the small set of features for classification of texts. To select the small set of features, a subspace of features from the corpus of training data may be randomly sampled according to a probability distribution over the set of features where a probability may be assigned to each of the features that is proportional to the square of the Euclidean norms of the rows of left singular vectors of a matrix of the features representing the corpus of training texts. The small set of features may classify texts using only the relevant features among a very large number of training features.


Find Patent Forward Citations

Loading…