The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Nov. 29, 2022

Filed:

Jun. 12, 2020
Applicants:

Baidu Usa, Llc, Sunnyvale, CA (US);

Baidu.com Times Technology (Beijing) Co., Ltd., Beijing, CN;

Inventors:

Miao Liao, San Jose, CA (US);

Sibo Zhang, San Jose, CA (US);

Peng Wang, Sunnyvale, CA (US);

Ruigang Yang, Beijing, CN;

Assignees:
Attorney:
Primary Examiner:
Int. Cl.
CPC ...
G06T 13/00 (2011.01); G06T 13/40 (2011.01); G06K 9/62 (2022.01); G06N 3/08 (2006.01); G10L 15/26 (2006.01); G06N 3/04 (2006.01); G10L 25/57 (2013.01); G06V 40/20 (2022.01);
U.S. Cl.
CPC ...
G06T 13/40 (2013.01); G06K 9/6256 (2013.01); G06N 3/0454 (2013.01); G06N 3/08 (2013.01); G06V 40/28 (2022.01); G10L 15/26 (2013.01); G10L 25/57 (2013.01);
Abstract

Presented herein are novel embodiments for converting a given speech audio or text into a photo-realistic speaking video of a person with synchronized, realistic, and expressive body dynamics. In one or more embodiments, 3D skeleton movements are generated from the audio sequence using a recurrent neural network, and an output video is synthesized via a conditional generative adversarial network. To make movements realistic and expressive, the knowledge of an articulated 3D human skeleton and a learned dictionary of personal speech iconic gestures may be embedded into the generation process in both learning and testing pipelines. The former prevents the generation of unreasonable body distortion, while the later helps the model quickly learn meaningful body movement with a few videos. To produce photo-realistic and high-resolution video with motion details, a part-attention mechanism is inserted in the conditional GAN, where each detailed part is automatically zoomed in to have their own discriminators.


Find Patent Forward Citations

Loading…