The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 16, 2021

Filed:

Mar. 27, 2018
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Chen-Yu Huang, Taipei, TW;

Sheng-Wei Lee, Changhua, TW;

June-Ray Lin, Taipei, TW;

Ci-Hao Wu, Taipei, TW;

Hsieh-Lung Yang, Taipei, TW;

Ying-Chen Yu, Taipei, TW;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/951 (2019.01); G06F 16/958 (2019.01); H04L 29/08 (2006.01); G06F 40/103 (2020.01); G06F 16/9535 (2019.01); G06F 16/33 (2019.01); G06F 40/279 (2020.01);
U.S. Cl.
CPC ...
G06F 16/9535 (2019.01); G06F 16/334 (2019.01); G06F 16/986 (2019.01); G06F 40/279 (2020.01); H04L 67/02 (2013.01);
Abstract

A method, computer system, and a computer program product for crawling and extracting main content from a web page is provided. The present invention may include retrieving a HTML document associated with a web page. The present invention may then include identifying at least one entry point located in the retrieved HTML document by utilizing a self-adaptive entry point locator. The present invention may also include extracting a main content article associated with the retrieved HTML document based on the identified at least one entry point. The present invention may further include presenting the extracted main content associated with the retrieved HTML document to the user.


Find Patent Forward Citations

Loading…