Title:
SPEECH SYNTHESIS METHOD AND APPARATUS, AND DEVICE AND COMPUTER-READABLE STORAGE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2020/232997
Kind Code:
A1
Abstract:
A speech synthesis method and apparatus, and a device and a computer-readable storage medium. The method comprises: determining a reference speech sequence, and acquiring a speech synthesis model and a target text vector corresponding to a target text sequence to be synthesized (S101); encoding the reference speech sequence by means of a reference encoder to obtain a target reference embedded vector corresponding to the reference speech sequence (S102); performing, by means of a style marking layer, style marking on the target reference embedded vector to obtain a target style embedded vector corresponding to the reference speech sequence (S103); and by means of a speech synthesis layer and on the basis of the target text vector and the target style embedded vector, executing a speech synthesis operation to obtain target speech (S104). By means of the method, speech synthesized by a speech rhythm expressed by a target style embedded vector can be obtained, thereby effectively improving the expressive power and accuracy of synthetic speech.
Inventors:
WANG JIANZONG (CN)
SUN AOLAN (CN)
PENG HUAYI (CN)
CHENG NING (CN)
SUN AOLAN (CN)
PENG HUAYI (CN)
CHENG NING (CN)
Application Number:
PCT/CN2019/117254
Publication Date:
November 26, 2020
Filing Date:
November 11, 2019
Export Citation:
Assignee:
PING AN TECH SHENZHEN CO LTD (CN)
International Classes:
G10L13/08
Foreign References:
CN110288973A | 2019-09-27 | |||
CN109616127A | 2019-04-12 | |||
CN109754779A | 2019-05-14 | |||
US20180336880A1 | 2018-11-22 |
Other References:
Y. WANG ET AL.: "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis", ARXIV PREPRINT ARXIV:1803.09017, 23 March 2018 (2018-03-23), XP080862481
YE JIA ET AL.: "Transfer Learning from Speaker Veriļ¬cation to Multispeaker Text-To-Speech Synthesis", ARXIV PREPRINT ARXIV: 1806.04558V4, 12 June 2018 (2018-06-12), XP081425976
YE JIA ET AL.: "Transfer Learning from Speaker Veriļ¬cation to Multispeaker Text-To-Speech Synthesis", ARXIV PREPRINT ARXIV: 1806.04558V4, 12 June 2018 (2018-06-12), XP081425976
Attorney, Agent or Firm:
SHENZHEN LIDAO INTELLECTUAL PROPERTY AGENCY (GENERAL PARTNERSHIP) (CN)
Download PDF: