The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Oct. 28, 2008
Filed:
Dec. 21, 2005
Hosagrahar Visvesvaraya Jagadish, Ann Arbor, MI (US);
Nikolaos Koudas, Springfield, NJ (US);
Shanmugavelayutham Muthukrishnan, Washington, DC (US);
Divesh Srivastava, Summit, NJ (US);
Hosagrahar Visvesvaraya Jagadish, Ann Arbor, MI (US);
Nikolaos Koudas, Springfield, NJ (US);
Shanmugavelayutham Muthukrishnan, Washington, DC (US);
Divesh Srivastava, Summit, NJ (US);
AT&T Corp., New York, NY (US);
Abstract
Approximate substring indexing is accomplished by decomposing each string in a database into overlapping 'positional q-grams', sequences of a predetermined length q, and containing information regarding the 'position' of each q-gram within the string (i.e., 1q-gram, 4q-gram, etc.). An index is then formed of the tuples of the positional q-gram data (such as, for example, a B-tree index or a hash index). Each query applied to the database is similarly parsed into a plurality of positional q-grams (of the same length), and a candidate set of matches is found. Position-directed filtering is used to remove the candidates which have the q-grams in the wrong order and/or too far apart to form a “verified” output of matching candidates. If errors are permitted (defined in terms of an edit distance between each candidate and the query), an edit distance calculation can then be performed to produce the final set of matching strings.