For the Inventor, By the Inventor

The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12243511 B1

Date of Patent:

Mar. 04, 2025

Filed:

Mar. 31, 2022

Emphasizing portions of synthesized speech

Applicant:

Amazon Technologies, Inc., Seattle, WA (US);

Inventors:

Arnaud Vincent Pierre Yves Joly, Cambridge, GB;

Marco Nicolis, London, GB;

Elena Sergeevna Sokolova, London, GB;

Jedrzej Sobanski, Gdansk, PL;

Mateusz Aleksander Lajszczak, Cambridge, GB;

Arent van Korlaar, London, GB;

Ruizhe Li, London, GB;

Assignee:

Amazon Technologies, Inc., Seattle, WA (US);

Attorney:

Pierce Atwood LLP

Primary Examiner:

Michael Colucci

Int. Cl.

CPC ...

G10L 13/10 (2013.01); G10L 13/033 (2013.01); G10L 13/04 (2013.01); G10L 13/06 (2013.01); G10L 15/26 (2006.01);

U.S. Cl.

CPC ...

G10L 13/10 (2013.01); G10L 13/033 (2013.01); G10L 13/04 (2013.01); G10L 13/06 (2013.01); G10L 15/26 (2013.01);

Abstract

A neural text-to-speech system may be configured to emphasize words. Applying emphasis where appropriate enables the TTS system to better reproduce prosodic characteristics of human speech. Emphasis may make the resulting synthesized speech more understandable and engaging than synthesized speech lacking emphasis. Emphasis may be manually annotated to, and/or predicted from, a source text (e.g., a book). In some implementations, the system may use a generative model such as a variational autoencoder to generate word acoustic embeddings indicating how emphasis is to be reflected in the synthesized speech. A phoneme encoder of the TTS system may process phonemes to generate phoneme embeddings. A decoder may process the word acoustic embeddings and the phoneme embeddings to generate spectrogram data representing the synthesized speech.

Find Patent Forward Citations

Loading…