The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Mar. 18, 2025

Filed:

Jun. 30, 2022
Applicant:

Amazon Technologies, Inc., Seattle, WA (US);

Inventors:

Mateusz Aleksander Lajszczak, Cambridge, GB;

Adam Marek Gabrys, Sopot, PL;

Arent van Korlaar, London, GB;

Ruizhe Li, London, GB;

Elena Sergeevna Sokolova, London, GB;

Jaime Lorenzo Trueba, Madrid, ES;

Arnaud Vincent Pierre Yves Joly, Cambridge, GB;

Marco Nicolis, London, GB;

Ekaterina Petrova, Oberhaching, DE;

Assignee:

Amazon Technologies, Inc., Seattle, WA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 13/047 (2013.01); G10L 15/02 (2006.01); G10L 15/16 (2006.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 25/18 (2013.01);
U.S. Cl.
CPC ...
G10L 13/047 (2013.01); G10L 15/02 (2013.01); G10L 15/16 (2013.01); G10L 15/1815 (2013.01); G10L 15/22 (2013.01); G10L 25/18 (2013.01); G10L 2015/025 (2013.01);
Abstract

A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data, and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.


Find Patent Forward Citations

Loading…