The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Mar. 04, 2014
Filed:
Nov. 25, 2009
Li-mei Jiao, Beijing, CN;
Yuhong Xiong, Mountain View, CA (US);
Li-Mei Jiao, Beijing, CN;
Yuhong Xiong, Mountain View, CA (US);
Hewlett-Packard Development Company, L.P., Houston, TX (US);
Abstract
Disclosed is a method of automatically extracting data from a target web page, comprising selecting () data in a source web page; determining () the respective DOM (document object model) trees of the source and target web page, and identifying the one or more nodes comprising the selected data in the source web page DOM tree; determining () matching paths in the respective DOM trees; for selected data in a node of an unmatched branch of the source web page DOM tree, identifying () the nearest matched path in the source web page; identifying () the unmatched branch nearest to the corresponding matched path in the target web page; determining () if said identified unmatched branch in the target web page DOM tree comprises a target node matching the selected data node; and if so: extracting () data from the target node if the mismatch between the respective unmatched branches does not exceed a predefined threshold. A computer program product and system implementing this method are also disclosed.