The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 28, 2015

Filed:

Sep. 30, 2009
Applicants:

Zaiqing Nie, Beijing, CN;

Yong Cao, Beijing, CN;

Ji-rong Wen, Beijing, CN;

Chunyu Yang, Beijing, CN;

Inventors:

Zaiqing Nie, Beijing, CN;

Yong Cao, Beijing, CN;

Ji-Rong Wen, Beijing, CN;

Chunyu Yang, Beijing, CN;

Assignee:
Attorneys:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/00 (2006.01); G06F 17/27 (2006.01);
U.S. Cl.
CPC ...
G06F 17/278 (2013.01);
Abstract

Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.


Find Patent Forward Citations

Loading…