Title:
MULTI-STYLE AUDIO SYNTHESIS METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2022/116432
Kind Code:
A1
Abstract:
The present application relates to the field of artificial intelligence. Disclosed are a multi-style audio synthesis method, apparatus and device, and a storage medium. The method comprises: acquiring text data to be processed and a first Mel spectrum of a single style; inputting the first Mel spectrum into a preset style extraction network for feature extraction, so as to obtain a first style feature; inputting the text data into an encoder in a preset Mel spectrum generation network for feature extraction, and splicing an extracted first text feature and the first style feature, so as to obtain a first fusion feature; inputting the first fusion feature into a decoder in the preset Mel spectrum generation network for feature conversion, so as to obtain a second Mel spectrum; and inputting the second Mel spectrum into a preset vocoder for audio generation, so as to obtain multi-style audio. By means of the present application, multi-style audio can be generated by means of taking a style feature as a conditional feature of a vocoder.
Inventors:
LIANG SHUANG (CN)
CHEN MINCHUAN (CN)
MA JUN (CN)
WANG SHAOJUN (CN)
CHEN MINCHUAN (CN)
MA JUN (CN)
WANG SHAOJUN (CN)
Application Number:
PCT/CN2021/083546
Publication Date:
June 09, 2022
Filing Date:
March 29, 2021
Export Citation:
Assignee:
PING AN TECH SHENZHEN CO LTD (CN)
International Classes:
G10L13/02; G10L13/04; G10L25/03; G10L25/18
Foreign References:
CN112562634A | 2021-03-26 | |||
CN110136690A | 2019-08-16 | |||
CN111627418A | 2020-09-04 | |||
US20200258496A1 | 2020-08-13 | |||
US20200051583A1 | 2020-02-13 |
Attorney, Agent or Firm:
BEIJING JINGDA LAW FIRM (CN)
Download PDF: