The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 21, 2025

Filed:

Feb. 07, 2025
Applicant:

Sas Institute Inc., Cary, NC (US);

Inventors:

Fan Wang, Beijing, CN;

Teresa S. Jade, Cary, NC (US);

Xu Yang, Cary, NC (US);

Assignee:

SAS Institute Inc., Cary, NC (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 7/02 (2006.01); G06F 16/00 (2019.01); G06F 16/906 (2019.01); G06F 16/93 (2019.01); G06F 16/355 (2025.01);
U.S. Cl.
CPC ...
G06F 16/906 (2019.01); G06F 16/93 (2019.01); G06F 16/355 (2019.01);
Abstract

Techniques described herein provide for automated near-duplicate detection for new text documents given text documents that were previously processed using automated near-duplicate detection for text documents. In one example, a system can receive new documents and documents that were previously processed using a predefined processing technique for automated near-duplicate detection. The system can process the new documents and cluster the new documents into multiple predefined clusters previously identified using the predefined processing technique. For each predefined cluster including at least one new document, the system can generate document groups by determining similarity scores using the predefined processing technique as applied to the documents in the predefined clusters. The system can identify a representative document for each document group and generate an output data structure including the document groups and the representative document for each group.


Find Patent Forward Citations

Loading…