The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Nov. 17, 2020
Filed:
Sep. 26, 2013
Emc Corporation, Hopkinton, MA (US);
Philip Shilane, Yardley, PA (US);
Grant Wallace, Pennington, NJ (US);
Frederick Douglis, Basking Ridge, NJ (US);
Guanlin Lu, San Jose, CA (US);
EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US);
Abstract
Techniques for improving data compression of a storage system using coarse and fine grained similarity are described herein. According to one embodiment, region sketches for a plurality of regions of the set of data are generated, each region storing a plurality of data chunks. A region sketch index having a plurality of entries is maintained, each corresponding to one of the region sketches of the regions. The entries of the region sketch index are sorted based on the sketches of the regions, such that regions with an identical region sketch are positioned adjacent to each other within the region sketch index, representing similar regions. The data chunks of the similar regions that are identified based on the sorted entries of the region sketch index are reorganized to improve data compression of the data chunks of the similar regions.