The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 26, 2019

Filed:

Oct. 30, 2015
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Songtao Guo, Cupertino, CA (US);

Christopher Matthew Degiere, Palo Alto, CA (US);

Aarti Kumar, San Carlos, CA (US);

Alex Ching Lai, Menlo Park, CA (US);

Xian Li, San Jose, CA (US);

Assignee:
Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06K 9/46 (2006.01); G06K 9/00 (2006.01); G06K 9/62 (2006.01); G06F 17/30 (2006.01); G06Q 10/10 (2012.01); G06N 99/00 (2019.01); G06N 7/02 (2006.01); G06Q 10/06 (2012.01); G06Q 50/00 (2012.01); H04L 29/08 (2006.01);
U.S. Cl.
CPC ...
G06K 9/00456 (2013.01); G06F 17/30256 (2013.01); G06F 17/30259 (2013.01); G06F 17/30265 (2013.01); G06F 17/30448 (2013.01); G06F 17/30466 (2013.01); G06F 17/30598 (2013.01); G06F 17/30864 (2013.01); G06F 17/30958 (2013.01); G06K 9/00469 (2013.01); G06K 9/46 (2013.01); G06K 9/6215 (2013.01); G06K 9/6256 (2013.01); G06K 9/6263 (2013.01); G06K 9/6276 (2013.01); G06N 7/02 (2013.01); G06N 99/005 (2013.01); G06Q 10/06393 (2013.01); G06Q 10/10 (2013.01); G06Q 50/01 (2013.01); H04L 67/10 (2013.01); H04L 67/306 (2013.01); G06K 2209/25 (2013.01);
Abstract

In an example embodiment, a fuzzy join operation is performed by, for each pair of records, evaluating a first plurality of features for both records in the pair of records by calculating term frequency-inverse term frequency (TF-IDF) for each token of each field relevant to each feature and based on the calculated TF-IDF for each token of each field relevant to each feature, computing a similarity score based on the similarity function by adding a weight assigned to the TF-IDF for any token that appears in both records. Then a graph data structure is created, having a node for each record in the plurality of records and edges between each of the nodes, except, for each record pair having a similarity score that does not transgress a first threshold, causing no edge between the nodes for the record pair to appear in the graph data structure.


Find Patent Forward Citations

Loading…