The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 04, 2017

Filed:

Jul. 13, 2016
Applicant:

International Business Machines Corporation, Armonk, NY (US);

Inventors:

Andrey Balmin, San Jose, CA (US);

Vuk Ercegovac, Campbell, CA (US);

Peter J. Haas, San Jose, CA (US);

Liping Peng, Amherst, MA (US);

John Sismanis, San Jose, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01);
U.S. Cl.
CPC ...
G06F 17/30598 (2013.01); G06F 17/3053 (2013.01); G06F 17/30324 (2013.01); G06F 17/30486 (2013.01); G06F 17/30536 (2013.01); G06F 17/30867 (2013.01);
Abstract

A computer-implemented method includes partitioning a plurality of records into a plurality of splits. Each split includes at least a portion of the plurality of records. The method further includes providing at least one split of the plurality of splits to a mapper. The mapper scans the input data set, transforms each input record using a map function, and extracts a grouping key in parallel. The method further includes assigning at least a portion the records of the at least one split to a group. Each assignment to the group is based on a strata of the assigned record, and filtering the records of the group. Each filtering is based on a comparison of a weight of a record to a local threshold of the mapper. The method further includes shuffling the group to a reducer and providing a stratified sampling of the plurality of records based on the group.


Find Patent Forward Citations

Loading…