The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jul. 14, 2015
Filed:
Apr. 11, 2011
Arvind Arasu, Bothel, WA (US);
Michaela Götz, Ithaca, NY (US);
Shriraghav Kaushik, Bellevue, WA (US);
Arvind Arasu, Bothel, WA (US);
Michaela Götz, Ithaca, NY (US);
Shriraghav Kaushik, Bellevue, WA (US);
Microsoft Technology Licensing, LLC, Redmond, WA (US);
Abstract
An active learning record matching system and method for producing a record matching package that is used to identify pairs of duplicate records. Embodiments of the system and method allow a precision threshold to be specified and then generate a learned record matching package having precision greater than this threshold and a recall close to the best possible recall. Embodiments of the system and method use a blocking technique to restrict the space of record matching packages considered and scale to large inputs. The learning method considers several record matching packages, estimates the precision and recall of the packages, and identifies the package with maximum recall having precision greater than equal to the given precision threshold. A human domain expert labels a sample of record pairs in the output of the package as matches or non-matches and this labeling is used to estimate the precision of the package.