The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 12, 2000

Filed:

Jun. 28, 1996
Applicant:
Inventor:

John W Miller, Kirkland, WA (US);

Assignee:

Microsoft Corporation, Redmond, WA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F / ;
U.S. Cl.
CPC ...
707101 ; 707-6 ; 707-7 ; 707-3 ;
Abstract

A method for constructing a data structure for a data string of characters includes producing a matrix of sorted rotations of the data string. This matrix defines an A array which is a sorted list of the characters in the data string, a B array which is a permutation of the data string, and a correspondence array C which contains correspondence entries linking the characters in the A array to the same characters in the B array. A reduced A' array is computed to identify each unique character in the A array and a reduced C' array is computed to contain every s.sup.th entry of the C array. The B array is segmented into blocks of size s. During a search, the A' and C' arrays are used to index the B array to reconstruct any desired row from the matrix of rotations. Through this representation, the matrix of rotations can thus be used as a conventional sorted list for pattern matching or information retrieval applications. A data structure containing only the A', B, and C' has very little memory overhead. The B array contains the same number of characters as the original data string, and can be compressed in a block wise manner to reduce its size. The A' array is a fixed size equal to the size of the alphabet used to construct the data string, and the C' array is variable size according to the relationship n/s, where n is the number of characters in the data string and s is the size of the blocks of the B array. Accordingly, the data structure enables a tradeoff between access speed and memory overhead, the product of which is constant with respect to block size s.


Find Patent Forward Citations

Loading…