The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 11238843 B1

Date of Patent:

Feb. 01, 2022

Filed:

Sep. 26, 2018

Systems and methods for neural voice cloning with a few samples

Applicant:

Baidu Usa, Llc, Sunnyvale, CA (US);

Inventors:

Sercan O. Arik, San Francisco, CA (US);

Jitong Chen, Sunnyvale, CA (US);

Kainan Peng, Sunnyvale, CA (US);

Wei Ping, Sunnyvale, CA (US);

Yanqi Zhou, San Jose, CA (US);

Assignee:

Baidu USA LLC, Sunnyvale, CA (US);

Attorney:

North Weber & Baugh LLP

Primary Examiner:

Michael Ortiz-Sanchez

Int. Cl.

CPC ...

G10L 13/00 (2006.01); G10L 13/047 (2013.01); G10L 13/027 (2013.01); G10L 13/08 (2013.01);

U.S. Cl.

CPC ...

G10L 13/047 (2013.01); G10L 13/027 (2013.01); G10L 13/08 (2013.01);

Abstract

Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Find Patent Forward Citations