The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Dec. 21, 2021

Filed:

Apr. 03, 2020
Applicant:

Tamr, Inc., Cambridge, MA (US);

Inventors:

George Beskales, Waltham, MA (US);

Ihab F. Ilyas, Waterloo, CA;

Assignee:

TAMR, INC., Cambridge, MA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 7/02 (2006.01); G06F 16/00 (2019.01); G06F 3/06 (2006.01); G06F 16/903 (2019.01); G06F 16/174 (2019.01); G06F 16/906 (2019.01);
U.S. Cl.
CPC ...
G06F 3/0641 (2013.01); G06F 16/90344 (2019.01); G06F 16/1748 (2019.01); G06F 16/906 (2019.01);
Abstract

Fast record deduplication is accomplished by providing as an input, data records having multiple attributes, and local similarity functions of individual attributes with local similarity thresholds. Bin IDs are then generated based on the local similarity functions and the local similarity thresholds. The Bin IDs are unique identifiers of a respective bin of records, and the bin of records is a set of records that are possibly pairwise similar. Local candidate pairs are identified based on data records that share Bin IDs. The local candidate pairs are aggregated to produce a set of global candidate pairs. The set of global candidate pairs are filtered by deciding whether a pair of data records represents a duplicate.


Find Patent Forward Citations

Loading…