The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 13, 2021

Filed:

Mar. 01, 2019
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Saikat Guha, Seattle, WA (US);

Gary Kyle Soeller, San Diego, CA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/28 (2019.01); G06N 5/04 (2006.01);
U.S. Cl.
CPC ...
G06F 16/285 (2019.01); G06N 5/04 (2013.01);
Abstract

Described herein is a system and method for inferring data relationships of a plurality of datasets. Data contents (and optionally metadata) of the plurality of datasets are scanned to extract features of each of the datasets. Features can be related to a structure of data, a profile of data within the dataset, and/or metadata of the dataset. Each feature has an associated weight. The datasets can be clustered into clusters based on at least some of the weighted features (e.g., based on a sim-hash or min-hash of the dataset). A precise similarity metric is computed between datasets in each cluster based on their weighted features. Datasets with precise similarity metrics above a threshold quantity are inferred to be being likely related. Information is provided regarding the inferred likely related datasets.


Find Patent Forward Citations

Loading…