The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 08, 2020

Filed:

Apr. 24, 2018
Applicant:

Emc Ip Holding Company Llc, Hopkinton, MA (US);

Inventors:

Charles Christopher Bailey, Cary, NC (US);

Donna Barry Lewis, Holly Springs, NC (US);

Jeffrey Ford, Cary, NC (US);

Frederick Douglis, Basking Ridge, NJ (US);

Assignee:

EMC Holding Company, LLC, Hopkinton, MA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/00 (2019.01); G06F 16/174 (2019.01); G06F 3/06 (2006.01); G06F 11/14 (2006.01); G06F 16/215 (2019.01); G06F 16/25 (2019.01); G06F 16/22 (2019.01); G06F 40/14 (2020.01); G06F 40/205 (2020.01);
U.S. Cl.
CPC ...
G06F 16/1748 (2019.01); G06F 3/0608 (2013.01); G06F 11/1453 (2013.01); G06F 16/215 (2019.01); G06F 16/2282 (2019.01); G06F 16/258 (2019.01); G06F 40/14 (2020.01); G06F 40/205 (2020.01);
Abstract

Cassandra SSTable data is transformed to provide data rows that are a consistent size such that data in each row has a length that is contained within a selected fixed sized kilobyte segment for deduplication. Tables of a Cassandra cluster node are translated in parallel to JSON format using Cassandra SSTableDump and the table rows are parsed to provide data rows corresponding to the data in each table row. Each row of data is padded with a predictable pattern of bits such that the data row has a length corresponding to the selected fixed segment size and has boundary locations that correspond to multiple of the selected segment size. Since each row of data starts on a segment boundary, duplicate rows of data will be identified wherever they move within a table.


Find Patent Forward Citations

Loading…