The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Apr. 19, 2022

Filed:

Jul. 21, 2020
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Bar Haim, Ashqelon, IL;

Andrey Finkelshtein, Beer Sheva, IL;

Eitan Menahem, Beer Sheva, IL;

Noga Agmon, Givat Shmuel, IL;

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/00 (2019.01); G06F 16/23 (2019.01); G06F 16/22 (2019.01); G06K 9/62 (2022.01); G06N 20/00 (2019.01);
U.S. Cl.
CPC ...
G06F 16/2365 (2019.01); G06F 16/221 (2019.01); G06F 16/2282 (2019.01); G06F 16/2358 (2019.01); G06K 9/6256 (2013.01); G06N 20/00 (2019.01);
Abstract

A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.


Find Patent Forward Citations

Loading…