The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Nov. 17, 2020
Filed:
Dec. 18, 2015
Emc Ip Holding Company Llc, Hopkinton, MA (US);
Guilherme Menezes, Santa Clara, CA (US);
Abdullah Reza, Santa Clara, CA (US);
EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US);
Abstract
Identifying files that do not deduplicate well in a storage system with deduplication facilitates optimizing storage capacity by moving the identified files to less expensive storage without deduplication. Any set of files can be examined to remove files that are identified as files that do not deduplicate well. The process of identification includes arranging the files in a predefined order and using bitmap representations of the unique segments in the files to determine a count of different segments in neighboring next files compared to the previous files, and removing from deduplication any next files that exceed a difference threshold. The bitmap representations of the files allows the identification processes to be performed efficiently for large datasets. Any over-identification of files is minimized by repeating the identification processes on the set of files after arranging them in the reverse order.