The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
May. 30, 2023
Filed:
Sep. 17, 2020
Emc Ip Holding Company Llc, Hopkinton, MA (US);
Tony Wong, Milpitas, CA (US);
Abhinav Duggal, Jersey City, NJ (US);
Smriti Thakkar, San Jose, CA (US);
Yu Qiu, Hopkinton, MA (US);
Pei Jie Sim, Hopkinton, MA (US);
Rahul Nihalani, Hopkinton, MA (US);
EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US);
Abstract
A method, apparatus, and system for redistributing files in a multi-node storage system to improve global deduplication storage savings is disclosed. A plurality of file cluster candidates are generated for a plurality of files stored at a multi-node storage system comprising a plurality of data nodes. A similarity index is determined for each of the plurality of file cluster candidates based on similarity of the files comprised in the file cluster candidate. A ranked recipe list comprising a plurality of recipes is generated. Each recipe is associated with one of the plurality of file cluster candidates, comprises a destination data node for the associated file cluster candidate, and is associated with a deduplication space savings. At least some of the plurality of files are moved between the plurality of data nodes based on the recipes in the ranked recipe list to improve deduplication space savings in the multi-node storage system.