The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
May. 28, 2019

Filed:

Dec. 18, 2015
Applicant:

Emc Corporation, Hopkinton, MA (US);

Inventors:

Guilherme Menezes, Santa Clara, CA (US);

Abdullah Reza, Santa Clara, CA (US);

Assignee:

EMC IP HOLDING COMPANY LLC, Hopkinton, MA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 7/00 (2006.01); G06F 17/30 (2006.01);
U.S. Cl.
CPC ...
G06F 17/30598 (2013.01); G06F 17/30097 (2013.01); G06F 17/30156 (2013.01);
Abstract

Clustering files in deduplication systems is based on an estimate of similarity between files in a file system. The estimates of similarity are based on how much content the files share, where the estimate of how much content is shared is based on an estimate of segments shared. The estimate of segments shared is based on segment offsets found in the files' bitmap vectors of segment offsets. The found segment offsets are used to generate a cluster definition approximating an optimal data structure for clustering files that share content. The approximated optimal data structure defines clusters hierarchically arranged based on the offset numbers of the found segment offsets.


Find Patent Forward Citations

Loading…