The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 17, 2015

Filed:

Nov. 12, 2010
Applicants:

Xinying Song, Harbin, CN;

Yunbo Cao, Beijing, CN;

Chin-yew Lin, Beijing, CN;

Inventors:

Xinying Song, Harbin, CN;

Yunbo Cao, Beijing, CN;

Chin-Yew Lin, Beijing, CN;

Assignee:
Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01); G06F 7/00 (2006.01); G06F 17/22 (2006.01);
U.S. Cl.
CPC ...
G06F 17/227 (2013.01);
Abstract

Embodiments for a Mining Data Records based on Anchor Trees (MiBAT) process are disclosed. In accordance with at least one embodiment, the MiBAT process extracts data records containing user-generated content from web documents. The web document is processed into a Document Object Model (DOM) tree in which sub-trees of the DOM tree represent the data records of the web document. Domain constraints are used to locate structured portions of the DOM tree. Anchor trees are then located as being sets of sibling sub-trees which contain the domain constraints. The anchor trees are then used to determine a record boundary (i.e. the start offset and length) of the data records. Finally, the data records are extracted based on the anchor trees and the record boundaries.


Find Patent Forward Citations

Loading…