The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 14, 2015

Filed:

Apr. 11, 2011
Applicants:

Arvind Arasu, Bothel, WA (US);

Michaela Götz, Ithaca, NY (US);

Shriraghav Kaushik, Bellevue, WA (US);

Inventors:

Arvind Arasu, Bothel, WA (US);

Michaela Götz, Ithaca, NY (US);

Shriraghav Kaushik, Bellevue, WA (US);

Assignee:
Attorneys:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01); G06N 99/00 (2010.01);
U.S. Cl.
CPC ...
G06F 17/30507 (2013.01); G06N 99/005 (2013.01);
Abstract

An active learning record matching system and method for producing a record matching package that is used to identify pairs of duplicate records. Embodiments of the system and method allow a precision threshold to be specified and then generate a learned record matching package having precision greater than this threshold and a recall close to the best possible recall. Embodiments of the system and method use a blocking technique to restrict the space of record matching packages considered and scale to large inputs. The learning method considers several record matching packages, estimates the precision and recall of the packages, and identifies the package with maximum recall having precision greater than equal to the given precision threshold. A human domain expert labels a sample of record pairs in the output of the package as matches or non-matches and this labeling is used to estimate the precision of the package.


Find Patent Forward Citations

Loading…