The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 25, 2023

Filed:

Jul. 02, 2019
Applicant:

Microsoft Technology Licensing, Llc, Redmond, WA (US);

Inventors:

Nathan Roy Evans, Bremerton, WA (US);

Christopher Miles White, Seattle, WA (US);

Jonathan Karl Larson, Bremerton, WA (US);

Darren Keith Edge, Cambridge, GB;

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/906 (2019.01); G06F 16/901 (2019.01); G06F 40/216 (2020.01); G06V 30/414 (2022.01); G06V 30/416 (2022.01);
U.S. Cl.
CPC ...
G06V 30/414 (2022.01); G06F 16/906 (2019.01); G06F 16/9014 (2019.01); G06F 16/9024 (2019.01); G06F 40/216 (2020.01); G06V 30/416 (2022.01);
Abstract

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.


Find Patent Forward Citations

Loading…