The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 22, 2024

Filed:

Apr. 24, 2024
Applicant:

Sanas.ai Inc., Palo Alto, CA (US);

Inventors:

Shawn Zhang, Palo Alto, CA (US);

Lukas Pfeifenberger, Salzburg, AT;

Jason Wu, Santa Clara, CA (US);

Piotr Dura, Warsaw, PL;

David Braude, Edinburgh, GB;

Bajibabu Bollepalli, Cottenham, GB;

Alvaro Escudero, San Sebastian de los Reyes, ES;

Gokce Keskin, Mountain View, CA (US);

Ankita Jha, Bangalore, IN;

Maxim Serebryakov, Palo Alto, CA (US);

Assignee:

SANAS.AI INC., Palo Alto, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/00 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 21/0232 (2013.01); G10L 25/30 (2013.01); G10L 15/16 (2006.01); G10L 15/22 (2006.01);
U.S. Cl.
CPC ...
G10L 21/0232 (2013.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 25/30 (2013.01); G10L 15/16 (2013.01); G10L 15/22 (2013.01);
Abstract

The disclosed technology relates to methods, voice enhancement systems, and non-transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.


Find Patent Forward Citations

Loading…