The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jan. 03, 2023

Filed:

Dec. 10, 2019
Applicant:

Amazon Technologies, Inc., Seattle, WA (US);

Inventors:

Marcello Federico, Mountain View, CA (US);

Robert Enyedi, Santa Clara, CA (US);

Yaser Al-Onaizan, Cortlandt Manor, NY (US);

Roberto Barra-Chicote, Cambridge, GB;

Andrew Paul Breen, Norwich, GB;

Ritwik Giri, Sunnyvale, CA (US);

Mehmet Umut Isik, Menlo Park, CA (US);

Arvindh Krishnaswamy, Palo Alto, CA (US);

Hassan Sawaf, Los Gatos, CA (US);

Assignee:

Amazon Technologies, Inc., Seattle, WA (US);

Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G10L 13/08 (2013.01); G10L 15/22 (2006.01); G11B 20/10 (2006.01); G06F 3/16 (2006.01); G10L 13/10 (2013.01); G06F 40/47 (2020.01); G10L 25/90 (2013.01); G10L 15/06 (2013.01); G10L 13/00 (2006.01); G10L 15/26 (2006.01); G06V 40/16 (2022.01);
U.S. Cl.
CPC ...
G10L 13/10 (2013.01); G06F 40/47 (2020.01); G06V 40/161 (2022.01); G10L 13/00 (2013.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 15/26 (2013.01); G10L 25/90 (2013.01);
Abstract

Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.


Find Patent Forward Citations

Loading…