The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 05, 2012

Filed:

Dec. 18, 2009
Applicants:

Edith Cohen, Berkeley Heights, NJ (US);

Nicholas Duffield, Summit, NJ (US);

Haim Kaplan, Hod Hasharon, IL;

Carsten Lund, Berkeley Heights, NJ (US);

Mikkel Thorup, Florence, MA (US);

Inventors:

Edith Cohen, Berkeley Heights, NJ (US);

Nicholas Duffield, Summit, NJ (US);

Haim Kaplan, Hod Hasharon, IL;

Carsten Lund, Berkeley Heights, NJ (US);

Mikkel Thorup, Florence, MA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 17/00 (2006.01);
U.S. Cl.
CPC ...
Abstract

A method for producing a summary A of data points in an unaggregated data stream wherein the data points are in the form of weighted keys (a, w) where a is a key and w is a weight, and the summary is a sample of k keys a with adjusted weights w. A first reservoir L includes keys having adjusted weights which are additions of weights of individual data points of included keys and a second reservoir T includes keys having adjusted weights which are each equal to a threshold value τ whose value is adjusted based upon tests of new data points arriving in the data stream. The summary combines the keys and adjusted weights of the first reservoir L with the keys and adjusted weights of the second reservoir T to form the sample representing the data stream upon which further analysis may be performed. The method proceeds by first merging new data points in the stream into the reservoir L until the reservoir contains k different keys and thereafter applying a series of tests to new arriving data points to determine what keys and weights are to be added to or removed the reservoirs L and T to provide a summary with a variance that approaches the minimum possible for aggregated data sets. The method is composable, can be applied to high speed data streams such as those found on the Internet, and can be implemented efficiently.


Find Patent Forward Citations

Loading…