The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 9390163 B1

Date of Patent:

Jul. 12, 2016

Filed:

Apr. 24, 2006

Method, system and software arrangement for detecting or determining similarity regions between datasets

Applicants:

Salvatore Paxia, New York, NY (US);

Bhubaneswar Mishra, Great Neck, NY (US);

Yi Zhou, Shanghai, CN;

Inventors:

Salvatore Paxia, New York, NY (US);

Bhubaneswar Mishra, Great Neck, NY (US);

Yi Zhou, Shanghai, CN;

Assignee:

New York University, New York, NY (US);

Attorney:

Andrews Kurth LLP

Primary Examiner:

Russell S Negin

Int. Cl.

CPC ...

G01N 33/48 (2006.01); G01N 33/50 (2006.01); G06F 17/30 (2006.01); G06F 19/12 (2011.01); G06F 19/24 (2011.01); G06F 19/26 (2011.01); G06F 19/22 (2011.01);

U.S. Cl.

CPC ...

G06F 17/30675 (2013.01); G06F 19/12 (2013.01); G06F 19/24 (2013.01); G06F 19/26 (2013.01); G06F 19/22 (2013.01);

Abstract

Methods, systems, and computer-readable media are provided which can identify and provide local variations in regions of similarity among two or more data sets. These data sets may be represented as sequences such as, e.g., genomic sequences or words in a text. The local variations in similarity levels can be provided by selecting an initial prior distribution relating the data sets, organizing the first data set into windows and the remaining data sets into blocks, using the priors to sample one or more sets of words from the first data set, computing a similarity curve from exact and inexact matches for these words and, if convergence of results is not achieved, computing a new set of priors and repeating the sampling and computation of similarity curves. The computations can be performed using an amount of computational time that is linearly proportional to the size of the data sets.

Find Patent Forward Citations