The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Oct. 04, 2022

Filed:

Mar. 27, 2019
Applicant:

Baidu Usa, Llc, Sunnyvale, CA (US);

Inventors:

Sercan Arik, San Francisco, CA (US);

Hee Woo Jun, Sunnyvale, CA (US);

Eric Undersander, San Francisco, CA (US);

Gregory Diamos, Menlo Park, CA (US);

Assignee:

Baidu USA LLC, Sunnyvale, CA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 15/00 (2013.01); G10L 15/06 (2013.01); G10L 25/18 (2013.01); G10L 15/16 (2006.01); G10L 25/30 (2013.01); G10L 19/00 (2013.01);
U.S. Cl.
CPC ...
G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 25/18 (2013.01); G10L 19/00 (2013.01); G10L 25/30 (2013.01);
Abstract

For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis. Embodiments herein yield high-quality speech synthesis, without any iterative algorithms or autoregression in computations.


Find Patent Forward Citations

Loading…