The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G10L 13/08 (2013.01); G06F 40/126 (2020.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01); G06N 3/08 (2023.01); G06N 3/088 (2023.01); G10L 13/047 (2013.01);

U.S. Cl.

CPC ...

G10L 13/08 (2013.01); G06F 40/126 (2020.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06N 3/088 (2013.01); G10L 13/047 (2013.01); G06N 3/048 (2023.01);

Abstract

A method for training a non-autoregressive TTS model includes receiving training data that includes a reference audio signal and a corresponding input text sequence. The method also includes encoding the reference audio signal into a variational embedding that disentangles the style/prosody information from the reference audio signal and encoding the input text sequence into an encoded text sequence. The method also includes predicting a phoneme duration for each phoneme in the input text sequence and determining a phoneme duration loss based on the predicted phoneme durations and a reference phoneme duration. The method also includes generating one or more predicted mel-frequency spectrogram sequences for the input text sequence and determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence. The method also includes training the TTS model based on the final spectrogram loss and the corresponding phoneme duration loss.

Find Patent Forward Citations