The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jan. 26, 2021
Filed:
Feb. 24, 2016
Salesforce.com, Inc., San Francisco, CA (US);
Dai Duong Doan, Alameda, CA (US);
Arun Kumar Jagota, Sunnyvale, CA (US);
Chenghung Ker, Burlingame, CA (US);
Parth Vaishnav, Cupertino, CA (US);
Danil Dvinov, San Francisco, CA (US);
Dmytro Kudriavtsev, Belmont, CA (US);
salesforce.com, inc., San Francisco, CA (US);
Abstract
Some embodiments of the present invention include a method for identifying duplicate records from a group of records in a database system. The method includes generating a cluster of records from a group of records based on one or more keys; splitting the cluster of records into multiple subsets of records with each subset of records having fewer number of records than the cluster of records, wherein the splitting the cluster of records into multiple subsets of records is based on a number of records in the cluster of records exceeding a threshold; causing duplicate sets of records in each of the subsets of records to be identified, wherein a duplicate set of records includes one or more records, and wherein when a duplicate set of records includes two or more records, the two or more records are duplicates of one another; merging all of the duplicate sets of records identified from the multiple subsets of records forming a first group of duplicate sets of records; and forming a representative set of records based on selecting a representative record from each of the duplicate sets in the first group of duplicate sets of records.