The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 12, 2000

Filed:

Mar. 26, 1998
Applicant:
Inventors:

Andrei Z Broder, Menlo Park, CA (US);

Steven C Glassman, Mountain View, CA (US);

Charles G Nelson, Palo Alto, CA (US);

Mark S Manasse, San Francisco, CA (US);

Geoffrey G Zweig, Oakland, CA (US);

Assignee:

Digital Equipment Corporation, Maynard, MA (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F / ;
U.S. Cl.
CPC ...
707103 ; 707-3 ; 707-2 ; 707-5 ;
Abstract

A computer-implemented method determines the resemblance of data objects such as Web pages. Each data object is partitioned into a sequence of tokens. The tokens are grouped into overlapping sets of the tokens to form shingles. Each shingle is represented by a unique identification element encoded as a fingerprint. A minimum element from each of the images of the set of fingerprints associated with a document under each of a plurality of pseudo random permutations of the set of all fingerprints are selected to generate a sketch of each data object. The sketches characterize the resemblance of the data objects. The sketches can be further partitioned into a plurality of groups. Each group is fingerprinted to form a feature. Data objects that share more than a certain numbers of features are estimated to be nearly identical.


Find Patent Forward Citations

Loading…