The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Aug. 17, 1999

Filed:

Jul. 12, 1996
Applicant:
Inventor:

Max L Benson, Redmond, WA (US);

Assignee:

Microsoft Corporation, Redmond, WA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F / ;
U.S. Cl.
CPC ...
707100 ; 707101 ; 341 51 ;
Abstract

In one aspect, the disclosed technique detects common leading byte patterns in the integers so that these patterns need only be stored once in the encoded byte stream. Those integers that share a common leading byte pattern are stored in truncated form, without their common leading bytes. These truncated integers may themselves be further examined to determine if any of them share additional common leading bytes beyond those already detected. Thus, the technique lends itself naturally to description using the language of trees. Integers with a common leading byte pattern are stored as child nodes, their parent being the node containing the common byte pattern. Child nodes consist only of those bytes remaining after the initial byte pattern has been extracted; the greater the number of children, the greater are the efficiency gains. All the children of a given tree or subtree are similarly examined for common leading byte patterns, ignoring those bytes that are already accounted for in their ancestor nodes. In a second aspect, the disclosed technique makes use of 'clustering', a second type of locality that is not reached by the interval concept. A cluster is a sequence of singleton integers that are very close together but do not form a contiguous interval. The technique recognizes that such a cluster can be compactly stored as a bitmap, in which each active bit ('1-bit') represents a member of the cluster. The choice of bitmap size (e.g., 1 byte, 2 bytes, etc.) can be calibrated to suit the clustering characteristics of the input data set.


Find Patent Forward Citations

Loading…