Company Filing History:
Years Active: 2020-2023
Title: Kainan Peng: Innovator in Neural Speech Synthesis
Introduction
Kainan Peng is a prominent inventor based in Sunnyvale, CA, known for his significant contributions to the field of neural speech synthesis. With a total of 8 patents, he has been at the forefront of developing innovative technologies that enhance the capabilities of text-to-speech systems.
Latest Patents
One of Kainan's latest patents is focused on multi-speaker neural text-to-speech. This invention describes systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings. The goal is to generate speech from different voices using a single model. As a foundation for multi-speaker experiments, improved single-speaker model embodiments, referred to as Deep Voice 2, were developed. Additionally, a post-processing neural vocoder for Tacotron, a neural character-to-spectrogram model, was introduced. The new techniques for multi-speaker speech synthesis were tested on two multi-speaker TTS datasets, demonstrating that neural text-to-speech systems can learn hundreds of unique voices from just twenty-five minutes of audio per speaker.
Another notable patent is for small-footprint flow-based models for raw audio, known as WaveFlow. This generative flow for raw audio can be directly trained with maximum likelihood. WaveFlow effectively manages the long-range structure of waveforms using a dilated two-dimensional convolutional architecture while modeling local variations with expressive autoregressive functions. It provides a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow. WaveFlow generates high-fidelity speech and synthesizes audio significantly faster than existing systems, using only a few sequential steps to produce relatively long waveforms. Its small footprint, with just 5.91 million parameters, makes it 15 times smaller than some existing models, allowing it to generate 22.05 kHz high-fidelity audio at 42.6 times faster than real-time on a V100 GPU.
Career Highlights
Kainan Peng is currently employed at Baidu USA LLC, where he continues to push the boundaries of speech synthesis technology. His work has garnered attention for its innovative approach and practical applications in various fields.
Collaborations
Kainan has collaborated with notable colleagues, including Wei Ping and Sercan Omer Arik, contributing to the advancement of their shared research interests.
Conclusion