The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 26, 2014

Filed:

Dec. 14, 2009
Applicants:

Ping Luo, Beijing, CN;

Jian Fan, San Jose, CA (US);

Samson J. Liu, Mountain View, CA (US);

Yuhong Xiong, Mountain View, CA (US);

Jerry J. Liu, Sunnyvale, CA (US);

Inventors:

Ping Luo, Beijing, CN;

Jian Fan, San Jose, CA (US);

Samson J. Liu, Mountain View, CA (US);

Yuhong Xiong, Mountain View, CA (US);

Jerry J. Liu, Sunnyvale, CA (US);

Assignee:
Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01); G06F 3/12 (2006.01);
U.S. Cl.
CPC ...
G06F 17/30896 (2013.01); G06F 3/1246 (2013.01);
Abstract

A method and system for extracting Web content is disclosed. In one embodiment, Web content in a Webpage is extracted by identifying paragraphs in the Web content based on line-break node determination. A range of text-body associated with the identified paragraphs is then identified using a maximum scoring subsequence. Further, the identified text-body is refined using a heuristic rule of substantially horizontal alignment. Furthermore, one or more titles and one or more images associated with the Web content are extracted. Moreover, the Web content including the identified paragraphs, the one or more titles and the one or more images are outputted.


Find Patent Forward Citations

Loading…