The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 07, 2017

Filed:

Dec. 04, 2013
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Dirk Harz, Kassel, DE;

Ralf Iffert, Kassel, DE;

Mark Keinhoerster, Dusseldorf, DE;

Mark Usher, Kassel, DE;

Attorneys:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
H04L 29/08 (2006.01); G06N 99/00 (2010.01);
U.S. Cl.
CPC ...
H04L 67/02 (2013.01); G06N 99/005 (2013.01);
Abstract

A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.


Find Patent Forward Citations

Loading…