Title:
SPEECH SYNTHESIS METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2022/121157
Kind Code:
A1
Abstract:
A speech synthesis method, which relates to the field of artificial intelligence, and comprises: acquiring training text, and performing position vector encoding and sound spectrum encoding on the training text by using a pre-built speech synthesis model to generate a sound spectrum (S1, S2); performing pitch projection and duration projection of the sound spectrum to obtain a pitch spectrum and a duration spectrum, and performing residual connection between the pitch spectrum and the sound spectrum to generate a residual sound spectrum (S3); summarizing and smoothing the duration spectrum and the residual sound spectrum to obtain a target sound spectrum (S4, S5); calculating a loss value of the target sound spectrum and a corresponding label value, and adjusting parameters of the speech synthesis model according to the loss value to obtain a trained speech synthesis model (S6); and performing speech synthesis on speech text to be synthesized by using the trained speech synthesis model to obtain a synthesis result (S7). Blockchain technology is also involved. The target sound spectrum can be stored in a blockchain, which may improve the accuracy of speech synthesis.
Inventors:
SUN AOLAN (CN)
WANG JIANZONG (CN)
CHENG NING (CN)
WANG JIANZONG (CN)
CHENG NING (CN)
Application Number:
PCT/CN2021/083185
Publication Date:
June 16, 2022
Filing Date:
March 26, 2021
Export Citation:
Assignee:
PING AN TECH SHENZHEN CO LTD (CN)
International Classes:
G06N3/08; G10L13/02; G10L13/08; G10L19/00; G10L19/02
Foreign References:
CN112466273A | 2021-03-09 | |||
CN111862934A | 2020-10-30 | |||
CN111179905A | 2020-05-19 | |||
CN111816156A | 2020-10-23 | |||
US10699695B1 | 2020-06-30 | |||
CN111161702A | 2020-05-15 |
Attorney, Agent or Firm:
SHENZHEN WORLD INTELLECTUAL PROPERTY AGENCY (GENERAL PARTNERSHIP ) (CN)
Download PDF: