The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

G10L 13/08 (2013.01); G10L 13/02 (2013.01); G10L 15/187 (2013.01); G06F 16/783 (2019.01); G06F 16/78 (2019.01); G06F 40/242 (2020.01); G10L 13/027 (2013.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01);

U.S. Cl.

CPC ...

G10L 13/08 (2013.01); G06F 16/7834 (2019.01); G06F 16/7867 (2019.01); G06F 40/242 (2020.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G10L 13/027 (2013.01); G10L 15/187 (2013.01);

Abstract

Presented herein are novel approaches to synthesize video of the speech from text. In a training phase, embodiments build a phoneme-pose dictionary and train a generative neural network model using a generative adversarial network (GAN) to generate video from interpolated phoneme poses. In deployment, the trained generative neural network in conjunction with the phoneme-pose dictionary convert an input text into a video of a person speaking the words of the input text. Compared to audio-driven video generation approaches, the embodiments herein have a number of advantages: 1) they only need a fraction of the training data used by an audio-driven approach; 2) they are more flexible and not subject to vulnerability due to speaker variation; and 3) they significantly reduce the preprocessing, training, and inference times.

Find Patent Forward Citations