The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
May. 31, 2016
Filed:
Aug. 27, 2010
Joy Thomas, Sunnyvale, CA (US);
Sauraj Goswami, Palo Alto, CA (US);
Vamsi Salaka, Fremont, CA (US);
Hewlett Packard Enterprise Development LP, Houston, TX (US);
Abstract
Documents likely to be near-duplicates are clustered based on document vectors that represent word-occurrence patterns in a relatively low-dimensional space. Edit distance between documents is defined based on comparing their document vectors. In one process, initial clusters are formed by applying a first edit-distance constraint relative to a root document of each cluster. The initial clusters can be merged subject to a second edit-distance constraint that limits the maximum edit distance between any two documents in the cluster. The second edit-distance constraint can be defined such that whether it is satisfied can be determined by comparing cluster structures rather than individual documents.