The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Sep. 19, 2023

Filed:

Feb. 05, 2019
Applicants:

Bonnie Berger Leighton, Newtonville, MA (US);

Deniz Yorukoglu, Cambridge, MA (US);

Yun William Yu, Cambridge, MA (US);

Jian Peng, Cambridge, MA (US);

Inventors:

Bonnie Berger Leighton, Newtonville, MA (US);

Deniz Yorukoglu, Cambridge, MA (US);

Yun William Yu, Cambridge, MA (US);

Jian Peng, Cambridge, MA (US);

Assignee:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06F 16/174 (2019.01); G16B 30/00 (2019.01); G16B 50/00 (2019.01); G16C 99/00 (2019.01); G16B 50/50 (2019.01); G16C 10/00 (2019.01); G16B 30/10 (2019.01);
U.S. Cl.
CPC ...
G06F 16/1744 (2019.01); G16B 30/00 (2019.02); G16B 30/10 (2019.02); G16B 50/00 (2019.02); G16B 50/50 (2019.02); G16C 10/00 (2019.02); G16C 99/00 (2019.02);
Abstract

This disclosure provides for a highly-efficient and scalable compression tool that compresses quality scores, preferably by capitalizing on sequence redundancy. In one embodiment, compression is achieved by smoothing a large fraction of quality score values based on k-mer neighborhood of their corresponding positions in read sequences. The approach exploits the intuition that any divergent base in a k-mer likely corresponds to either a single-nucleotide polymorphism (SNP) or sequencing error; thus, a preferred approach is to only preserve quality scores for probable variant locations and compress quality scores of concordant bases, preferably by resetting them to a default value. By viewing individual read datasets through the lens of k-mer frequencies in a corpus of reads, the approach herein ensures that compression 'lossiness' does not affect accuracy in a deleterious way.


Find Patent Forward Citations

Loading…