The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 26, 2012

Filed:

Sep. 20, 2005
Applicant:

Jasmine Novak, Mountain View, CA (US);

Inventor:

Jasmine Novak, Mountain View, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 7/00 (2006.01); G06F 17/30 (2006.01);
U.S. Cl.
CPC ...
Abstract

Disclosed is a method of extracting informative phrases from a full corpus of documents. An index of phrases contained in the full corpus of documents is built. Then, a user specifies a subset of text to analyze. The subset may be defined as: (1) all paragraphs or sentences containing terms selected as defining a subject; (2) all documents in a category; (3) all documents written within a date range; and/or (3) all documents matching a Boolean query of terms. Once the subset is specified, it is analyzed to extract informative phrases. Specifically, the index is queried to retrieve all phrases within the subset. The number of times each of the phases occurs in the subset and in the corpus is counted. Each phrase contained in the subset is scored according to informativeness based on a comparison of a likelihood that the phrase occurs in the subset and a likelihood that the phrase occurs in the corpus as a whole. Only those phrases having an informativeness score above a predetermined value are considered highly informative and extracted.


Find Patent Forward Citations

Loading…