SPEECH SYNTHESIS METHOD AND APPARATUS, AND DEVICE AND COMPUTER-READABLE STORAGE MEDIUM

Title:

SPEECH SYNTHESIS METHOD AND APPARATUS, AND DEVICE AND COMPUTER-READABLE STORAGE MEDIUM

Document Type and Number:

WIPO Patent Application WO/2020/232997

Kind Code:

A1

Abstract:

A speech synthesis method and apparatus, and a device and a computer-readable storage medium. The method comprises: determining a reference speech sequence, and acquiring a speech synthesis model and a target text vector corresponding to a target text sequence to be synthesized (S101); encoding the reference speech sequence by means of a reference encoder to obtain a target reference embedded vector corresponding to the reference speech sequence (S102); performing, by means of a style marking layer, style marking on the target reference embedded vector to obtain a target style embedded vector corresponding to the reference speech sequence (S103); and by means of a speech synthesis layer and on the basis of the target text vector and the target style embedded vector, executing a speech synthesis operation to obtain target speech (S104). By means of the method, speech synthesized by a speech rhythm expressed by a target style embedded vector can be obtained, thereby effectively improving the expressive power and accuracy of synthetic speech.

Inventors:

WANG JIANZONG (CN)
SUN AOLAN (CN)
PENG HUAYI (CN)
CHENG NING (CN)

Application Number:

PCT/CN2019/117254

Publication Date:

November 26, 2020

Filing Date:

November 11, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

PING AN TECH SHENZHEN CO LTD (CN)

International Classes:

G10L13/08

Foreign References:

CN110288973A	2019-09-27
CN109616127A	2019-04-12
CN109754779A	2019-05-14
US20180336880A1	2018-11-22

Other References:

Y. WANG ET AL.: "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis", ARXIV PREPRINT ARXIV:1803.09017, 23 March 2018 (2018-03-23), XP080862481
YE JIA ET AL.: "Transfer Learning from Speaker Veriﬁcation to Multispeaker Text-To-Speech Synthesis", ARXIV PREPRINT ARXIV: 1806.04558V4, 12 June 2018 (2018-06-12), XP081425976

Attorney, Agent or Firm:

SHENZHEN LIDAO INTELLECTUAL PROPERTY AGENCY (GENERAL PARTNERSHIP) (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: PRETREATMENT PROCESS AND APPARATUS FOR SORTING ORGANIC MATTER OF KITCHEN WASTE

Next Patent: MEDICAL IMAGE EVALUATION METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM