The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 28, 2015

Filed:

Nov. 30, 2011
Applicants:

Jun Yan, Beijing, CN;

Lei Ji, Beijing, CN;

Ning Liu, Beijing, CN;

Zheng Chen, Beijing, CN;

Inventors:

Jun Yan, Beijing, CN;

Lei Ji, Beijing, CN;

Ning Liu, Beijing, CN;

Zheng Chen, Beijing, CN;

Assignee:
Attorneys:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 7/00 (2006.01); G06F 17/30 (2006.01);
U.S. Cl.
CPC ...
G06F 17/30702 (2013.01);
Abstract

Techniques are described for generating structured information from semi-structured web pages, and retrieving the structured knowledge in response to a user query that indicates a query intent. The structured information is automatically extracted offline from semi-structured web pages, through the use of an auto wrapper solution that is noise tolerant, scalable, and automatic. The structured information is stored in a knowledge base, and provided in response to a user search query that indicates a query intent. Extraction of structured information may also include clustering of pages based on their measured similarities. The clusters may be determined based on similar elements in the tag path text data of the pages. A minimum size threshold may be applied to the clusters.


Find Patent Forward Citations

Loading…