The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Mar. 30, 2004
Filed:
Aug. 31, 2000
Srikanth R. Avadhanam, Redmond, WA (US);
Nigel R. Ellis, Redmond, WA (US);
Microsoft Corporation, Redmond, WA (US);
Abstract
Systems and methods create and maintain a maxdiff histogram for use in determining query costs. One aspect of the system is a data structure that provides fields that can be used to accurately represent a distribution of data regardless of the uniformity or lack thereof of the data. The fields of the data structure represent buckets in the histogram. The fields include a range_Hi_Key field indicating the upper bound for values represented by the bucket. The range_Hi_Key field is also the most frequently occurring value in the bucket. In addition, the fields include a cardEQ field representing the count of the most frequently occurring value, a cardLT field, which is the count of the values in the bucket that are less than the range_Hi_Key field, a LTDistinct field, which is a count of the number of distinct values represented by the bucket, and an LTDensity field, which is an average count for each of the attribute values in the bucket that are not the range_Hi_Key value. A further aspect is a method that creates and maintains the maxdiff histogram data structure. The method starts by creating a list of unused buckets. An input stream of attribute values is sorted and the following acts are performed for each value. If the new value is the same as the previous value, the cardEQ field is incremented. If not, the method checks to see if the histogram is full. If a bucket is available, it is allocated and the bucket fields are initialized. If a bucket is not available, the two buckets that have the least variance between them are merged into one bucket, and the freed bucket is made available for the newly read input value. Whenever a new bucket is created or altered, the variance between the bucket and its neighbors is recalculated.