The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Date of Patent:
Jul. 30, 2024

Filed:

Feb. 07, 2024
Applicant:

Nanjing Silicon Intelligence Technology Co., Ltd., Jiangsu, CN;

Inventors:

Huapeng Sima, Jiangsu, CN;

Haie Wu, Jiangsu, CN;

Ao Yao, Jiangsu, CN;

Da Jiang, Jiangsu, CN;

Yiping Tang, Jiangsu, CN;

Attorneys:
Primary Examiner:
Int. Cl.
CPC ...
G10L 13/08 (2013.01); G10L 15/02 (2006.01); G10L 17/04 (2013.01);
U.S. Cl.
CPC ...
G10L 13/08 (2013.01); G10L 15/02 (2013.01); G10L 17/04 (2013.01); G10L 2015/025 (2013.01);
Abstract

This application provide a synthetic audio output method and apparatus, a storage medium, and an electronic device. The method includes: inputting input text and a specified target identity identifier into an audio output model; extracting an identity feature sequence of a target identity by an identity recognition model; extracting a phoneme feature sequence corresponding to the input text by an encoding layer of a speech synthesis model; superimposing and inputting the identity feature sequence of the target identity and the phoneme feature sequence into a variable adapter of the speech synthesis model; and after duration prediction and alignment, energy prediction, and pitch prediction are performed on the phoneme feature sequence by the variable adapter, outputting a target Mel-frequency spectrum feature corresponding to the input text through a decoding layer of the speech synthesis model; and inputting the target Mel-frequency spectrum feature into a vocoder to output synthetic audio.


Find Patent Forward Citations

Loading…