The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 27, 2015

Filed:

Jan. 23, 2012
Applicants:

Xinying Song, Harbin, CN;

Zhiyuan Chen, Fuzhou, CN;

Yunbo Cao, Beijing, CN;

Chin-yew Lin, Beijing, CN;

Inventors:

Xinying Song, Harbin, CN;

Zhiyuan Chen, Fuzhou, CN;

Yunbo Cao, Beijing, CN;

Chin-Yew Lin, Beijing, CN;

Assignee:
Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01); G06F 17/22 (2006.01);
U.S. Cl.
CPC ...
G06F 17/30864 (2013.01); G06F 17/227 (2013.01); G06F 17/30867 (2013.01);
Abstract

Described herein are techniques for extracting data records containing user-generated content from documents. The documents may be processed into document trees in which sub-trees represent the data records of the document. Domain constraints may be used to locate structured portions of the document tree. For example, anchor trees may be located as being sets of sibling sub-trees with similar tag paths that contain the domain constraints. The anchor trees may then be used to determine a record boundary (e.g., the start offset and length) of the data records. Finally, the data records may be extracted based on the anchor trees and the record boundaries.


Find Patent Forward Citations

Loading…