The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 27, 2009

Filed:

Jun. 12, 2007
Applicants:

Arvind Arasu, Redmond, WA (US);

Venkatesh Ganti, Redmond, WA (US);

Shriraghav Kaushik, Redmond, WA (US);

Inventors:

Arvind Arasu, Redmond, WA (US);

Venkatesh Ganti, Redmond, WA (US);

Shriraghav Kaushik, Redmond, WA (US);

Assignee:

Microsoft Corporation, Redmond, WA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 17/30 (2006.01); G06F 7/06 (2006.01); G06F 7/08 (2006.01); G06F 7/10 (2006.01);
U.S. Cl.
CPC ...
Abstract

Input set indexing for set-similarity lookups. The architecture provides input to an indexing process that enables more efficient lookups for large data sets (e.g., disk-based) without requiring a full scan of the input. A new index structure is provided, the output of which is exact, rather than approximate. The similarity of two sets is specified using a similarity function that maps two sets to a numeric value that represents similarity of the two sets. Threshold-based lookups are addressed where two sets are considered similar if the numeric similarity score is above a threshold. The structure efficiently identifies all input sets within a distance k (e.g., a hamming distance) of the query set. Additional information in the form of frequency of elements (the number of input sets in which an element occurs) is used to improve index performance.


Find Patent Forward Citations

Loading…