The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jun. 13, 2023

Filed:

Jul. 16, 2020
Applicant:

Google Llc, Mountain View, CA (US);

Inventors:

Daisy Stanton, Mountain View, CA (US);

Eric Dean Battenberg, Sunnyvale, CA (US);

Russell John Wyatt Skerry-Ryan, Mountain View, CA (US);

Soroosh Mariooryad, Redwood City, CA (US);

David Teh-Hwa Kao, San Francisco, CA (US);

Thomas Edward Bagby, SanSan Francisco, CA (US);

Sean Matthew Shannon, Mountain View, CA (US);

Assignee:

Google LLC, Mountain View, CA (US);

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G10L 13/00 (2006.01); G10L 13/08 (2013.01); G10L 13/10 (2013.01); G10L 25/30 (2013.01); G10L 13/04 (2013.01); G10L 13/02 (2013.01); G06N 3/044 (2023.01);
U.S. Cl.
CPC ...
G10L 13/10 (2013.01); G10L 13/04 (2013.01); G10L 25/30 (2013.01); G06N 3/044 (2023.01); G10L 13/02 (2013.01); G10L 13/08 (2013.01);
Abstract

A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech. The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.


Find Patent Forward Citations

Loading…