The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 16, 2001

Filed:

Dec. 02, 1997
Applicant:
Inventors:

Nicholas Kushmerick, Seattle, WA (US);

Daniel S. Weld, Seattle, WA (US);

Robert B. Doorenbos, Seattle, WA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 1/730 ;
U.S. Cl.
CPC ...
G06F 1/730 ;
Abstract

A procedure is disclosed for automatically constructing wrappers for performing information-extraction from sites such as Internet resources that display relevant information, interspersed with extraneous text fragments, such as HTML formatting commands or advertisements. The procedure has three basic steps. First, a set of example pages are collected with a subroutine named GatherExamples. Gather Examples is provided with information describing how to pose example queries to the site whose wrapper is to be learned. Second, these example pages are labeled by a subroutine named LabelExamples—i.e., the information to be extracted from each example is identified for use in the third step. The LabelExamples subroutine uses a general framework for labeling pages using site-specific heuristics called recognizers, as well as allowing users to correct and modify the recognized instances. Finally, the labeled example pages are passed to a BuildWrapper subroutine, which constructs a wrapper.


Find Patent Forward Citations

Loading…