The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 20, 2006

Filed:

Jul. 20, 2001
Applicants:

Michel Decary, Montreal, CA;

Jonathan Stern, Newton, MA (US);

Kosmas Karadimitriou, Shrewsbury, MA (US);

Jeremy W. Rothman-shore, Cambridge, MA (US);

Inventors:

Michel Decary, Montreal, CA;

Jonathan Stern, Newton, MA (US);

Kosmas Karadimitriou, Shrewsbury, MA (US);

Jeremy W. Rothman-Shore, Cambridge, MA (US);

Assignee:

Zoom Information, Inc., Waltham, MA (US);

Attorney:
Assistant Examiner:
Int. Cl.
CPC ...
G08F 17/28 (2006.01);
U.S. Cl.
CPC ...
Abstract

Computer method and apparatus for extracting information from a Web page is disclosed. The invention apparatus is formed of an extractor coupled to receive Web pages from a source. The extractor uses natural language processing to extract desired information from the Web page. A storage subsystem receives from the extractor the extracted desired information and stores the extracted desired information in a database. The invention method for extracting data from a Web page includes the computer implemented steps of (i) using natural language processing, finding possible formal names on a given Web page, (ii) using pattern matching, searching the given Web page for formal names not found by the natural language processing, and (iii) refining a combined set of the found formal names to produce a working set of people and organization names extracted from the given Web page. The refining includes determining aliases of respective people and organization names, so as to effectively reduce duplicate names.


Find Patent Forward Citations

Loading…