The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 12, 2010

Filed:

May. 25, 2007
Applicants:

Srikanth Thirumalai, Clyde Hill, WA (US);

Egidio Terra, Porto Alegre, BR;

Vijai Mohan, Bellevue, WA (US);

Mark J. Tomko, Seattle, WA (US);

Grant M. Emery, Seattle, WA (US);

Aswath Manoharan, Bellevue, WA (US);

Inventors:

Srikanth Thirumalai, Clyde Hill, WA (US);

Egidio Terra, Porto Alegre, BR;

Vijai Mohan, Bellevue, WA (US);

Mark J. Tomko, Seattle, WA (US);

Grant M. Emery, Seattle, WA (US);

Aswath Manoharan, Bellevue, WA (US);

Assignee:

Amazon Technologies, Inc., Reno, NV (US);

Attorney:
Primary Examiner:
Assistant Examiner:
Int. Cl.
CPC ...
G06F 7/00 (2006.01); G06F 17/00 (2006.01);
U.S. Cl.
CPC ...
Abstract

A system and method for determining the likelihood of two documents describing substantially similar subject matter is presented. A set of tokens for each of two documents is obtained, each set representing strings of characters found in the corresponding document. A matrix of token pairs is determined, each token pair comprising a token from each set of tokens. For each token pair in the matrix, a similarity score is determined. Those token pairs in the matrix with a similarity score above a threshold score are selected and added to a set of matched tokens. A similarity score for the two documents is determined according to the scores of the token pairs added to the set of matched tokens. The determined similarity score is provided as the likelihood that the first and second documents describing substantially similar subject matter.


Find Patent Forward Citations

Loading…