The patent badge is an abbreviated version of the USPTO patent document. The patent badge does contain a link to the full patent document.

The patent badge is an abbreviated version of the USPTO patent document. The patent badge covers the following: Patent number, Date patent was issued, Date patent was filed, Title of the patent, Applicant, Inventor, Assignee, Attorney firm, Primary examiner, Assistant examiner, CPCs, and Abstract. The patent badge does contain a link to the full patent document (in Adobe Acrobat format, aka pdf). To download or print any patent click here.

Patent No.:

US 12198674 B1

Date of Patent:

Jan. 14, 2025

Filed:

Mar. 18, 2020

Speech synthesis method and apparatus, and storage medium

Applicants:

Beijing Jingdong Shangke Information Technology Co., Ltd., Beijing, CN;

Beijing Jingdong Century Trading Co., Ltd., Beijing, CN;

Inventors:

Zhizheng Wu, Beijing, CN;

Wei Song, Beijing, CN;

Assignees:

BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY CO., LTD., Beijing, CN;

BEIJING JINGDONG CENTURY TRADING CO., LTD., Beijing, CN;

Attorney:

Cooper Legal Group, LLC

Primary Examiner:

Nicole A K Schmieder

Int. Cl.

CPC ...

G10L 13/047 (2013.01); G10L 13/06 (2013.01); G10L 25/18 (2013.01);

U.S. Cl.

CPC ...

G10L 13/047 (2013.01); G10L 13/06 (2013.01); G10L 25/18 (2013.01);

Abstract

Disclosed are a speech synthesis method and apparatus, and a storage medium. The method comprises: acquiring a symbol sequency of a statement to be synthesized, wherein the statement to be synthesized comprises a recorded statement characterizing a target object and a query result statement for the target object; encoding the symbol sequence by using a pre-set encoding model, in order to obtain a feature vector set; acquiring recording acoustic features corresponding to the recorded statement; predicting, according to a pre-set decoding model, the feature vector set, a pre-set attention model and the recording acoustic features, acoustic features corresponding to the statement to be synthesized, in order to obtain predicted acoustic features corresponding to the statement to be synthesized, wherein the pre-set attention model is a model that uses the feature vector set to generate a context vector used for decoding, and the predicted acoustic features are composed of at least one associated acoustic feature; and performing feature conversion and synthesis on the predicted acoustic features to obtain a speech corresponding to the sentence to be synthesized.

Find Patent Forward Citations