The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jul. 27, 2010
Filed:
Aug. 31, 2007
Yanhong Zhai, Redmond, WA (US);
Yi LI, Issaquah, WA (US);
Richard Oian, Sammamish, WA (US);
Hong Gao, Seattle, WA (US);
Lei Tan, Bellevue, WA (US);
Yanhong Zhai, Redmond, WA (US);
Yi Li, Issaquah, WA (US);
Richard Oian, Sammamish, WA (US);
Hong Gao, Seattle, WA (US);
Lei Tan, Bellevue, WA (US);
Microsoft Corporation, Redmond, WA (US);
Abstract
Systems and methods for extracting data content items from a web page are provided. A template is created by labeling data content items of interest associated with a web page and generating a template Document Object Model (DOM) tree based on the labeled web page. DOM trees are also generated for additional web pages that contain data content items for which extraction may be desired. These DOM trees are compared to the template DOM tree to determine alignment there between. The aligned data content items may then be extracted from the additional web pages and indexed, as desired. Labeling the data content items of interest prior to generating a template DOM tree allows for the desired data content items to be specified and more accurately extracted from related and/or similarly structured web pages.