The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Feb. 07, 2023

Filed:

Apr. 14, 2020
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Michael Robert Glass, Bayonne, NJ (US);

Nicholas Brady Garvan Monath, Northampton, MA (US);

Robert G. Farrell, Cornwall, NY (US);

Alfio Massimiliano Gliozzo, Brooklyn, NY (US);

Gaetano Rossiello, Brooklyn, NY (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/30 (2019.01); G06F 16/35 (2019.01); G06N 3/08 (2006.01); G06F 40/40 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06F 40/216 (2020.01);
U.S. Cl.
CPC ...
G06F 16/35 (2019.01); G06F 40/40 (2020.01); G06N 3/08 (2013.01); G06F 40/216 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01);
Abstract

A computer-implemented method for performing cross-document coreference for a corpus of input documents includes determining mentions by parsing the input documents. Each mention includes a first vector for spelling data and a second vector for context data. A hierarchical tree data structure is created by generating several leaf nodes corresponding to respective mentions. Further, for each node, a similarity score is computed based on the first and second vectors of each node. The hierarchical tree is populated iteratively until a root node is created. Each iteration includes merging two nodes that have the highest similarity scores and creating an entity node instead at a hierarchical level that is above the two nodes being merged. Further, each iteration includes computing the similarity score for the entity node. The nodes with the similarity scores above a predetermined value are entities for which coreference has been performed in input documents.


Find Patent Forward Citations

Loading…