The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12287767 B1

Date of Patent:

Apr. 29, 2025

Filed:

Jan. 30, 2024

De-duplicating transaction records using targeted fuzzy matching

Applicant:

Coupa Software Incorporated, San Mateo, CA (US);

Inventors:

Jyotirmaya Mahanta, Pune, IN;

Ankit Narang, Pune, IN;

Shoan Jain, Berkeley, CA (US);

Prasanna Kumar, Hyderabad, IN;

Assignee:

Coupa Software Incorporated, San Mateo, CA (US);

Attorney:

Baker Botts L.L.P.

Primary Examiner:

Shahid A Alam

Int. Cl.

CPC ...

G06F 16/215 (2018.12); G06F 16/901 (2018.12); G06V 30/19 (2021.12); G06V 30/412 (2021.12);

U.S. Cl.

CPC ...

G06F 16/215 (2018.12); G06F 16/9024 (2018.12); G06V 30/19093 (2021.12); G06V 30/412 (2021.12);

Abstract

A computer-implemented method is disclosed. The method includes obtaining, by a de-duplication server, a candidate pair of a plurality of digitally stored documents from a document database. Text elements are identified from each digitally stored document in the candidate pair in response, and the text elements are stored as document extraction attributes. The method then automatically computes and stores relative positional differences of the text elements between each digitally stored document of the candidate pair and a document similarity score based on the relative positional differences. The relative positional differences are compared with a similarity function to form a difference similarity vector for the candidate pair. The difference similarity vector comprises components corresponding to each relative positional difference. The components of the difference similarity vector are aggregated to determine a final score for the candidate pair. A document-level similarity metric is determined from the final score. The method includes determining whether the final score is above a cutoff value, and in response to determining that the final score for the candidate pair is above the cutoff value, comparing the document extraction attribute with the final score. The method also determines whether the document-level similarity metric is above a threshold value by the de-duplication server. The candidate pair is classified based on determining that the document-level similarity metric is above the threshold value to de-duplicate the plurality of digitally stored documents in the candidate pair. Based on the classifying, duplicate transaction documents are removed from the document database by any of deleting records, marking records, updating column attributes, or writing records to a different table.

Find Patent Forward Citations