The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.
The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.
Patent No.:
Date of Patent:
Jul. 18, 2023
Filed:
Oct. 01, 2020
Baidu Usa, Llc, Sunnyvale, CA (US);
Sercan O. Arik, San Francisco, CA (US);
Mike Chrzanowski, Palo Alto, CA (US);
Adam Coates, Mountain View, CA (US);
Gregory Diamos, San Jose, CA (US);
Andrew Gibiansky, Mountain View, CA (US);
John Miller, Palo Alto, CA (US);
Andrew Ng, Mountain View, CA (US);
Jonathan Raiman, Palo Alto, CA (US);
Shubhahrata Sengupta, Menlo Park, CA (US);
Mohammad Shoeybi, Los Altos, CA (US);
Baidu USA LLC, Sunnyvale, CA (US);
Abstract
Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.