The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 14, 2022

Filed:

Jul. 24, 2020
Applicant:

Hitachi Vantara Llc, Santa Clara, CA (US);

Inventors:

Rohit Mahajan, Iselin, NJ (US);

Winnie Cheng, West New York, NJ (US);

Assignee:

HITACHI VANTARA LLC, Santa Clara, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/21 (2019.01); G06F 16/215 (2019.01); G06F 16/901 (2019.01); G06N 7/02 (2006.01); G06F 16/951 (2019.01); G06N 20/00 (2019.01); G06F 16/28 (2019.01);
U.S. Cl.
CPC ...
G06F 16/215 (2019.01); G06F 16/285 (2019.01); G06F 16/9024 (2019.01); G06F 16/951 (2019.01); G06N 7/023 (2013.01); G06N 20/00 (2019.01);
Abstract

A system and method for data entries deduplication are provided. The method includes indexing an input data set, wherein the input data set is in a tabular formant and the indexing includes providing a unique Row identifier (RowID), wherein rows are the data entries; computing attribute similarity for each column across each pair of rows; computing, for each pair of rows, row-to-row similarity as a weighted sum of attribute similarities; clustering pairs of rows based on their row-to-row similarities; and providing an output data set including at least the clustered pairs of rows.


Find Patent Forward Citations

Loading…