The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 16, 2022

Filed:

Sep. 22, 2021
Applicant:

Tamr, Inc., Cambridge, MA (US);

Inventor:

George Anwar Dany Beskales, Waltham, MA (US);

Assignee:

TAMR, INC., Cambridge, ME (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06N 20/00 (2019.01);
U.S. Cl.
CPC ...
G06N 20/00 (2019.01);
Abstract

A collection of clusters are selected to be used in training in an active learning workflow when using clusters to train supervised entity resolution in data sets. A collection of records is provided wherein each record in the collection has a cluster membership. A collection of record pairs is also provided, each record pair containing two distinct records from the collection of records, and each record pair having a similarity score. A collection of clusters is generated with uncertainty from the collection of records and the collection of record pairs. A subset of the collection of clusters with uncertainty is then selected using weighted sampling, wherein a function of the cluster uncertainty is used as the weight in the weighted sampling. The subset of the collection of clusters with uncertainty is the collection of clusters for training in and active learning workflow when using clusters to train supervised entity resolution in data sets.


Find Patent Forward Citations

Loading…