Title:
AN END-TO-END NEURAL SYSTEM FOR MULTI-SPEAKER AND MULTI-LINGUAL SPEECH SYNTHESIS
Document Type and Number:
WIPO Patent Application WO/2023/035261
Kind Code:
A1
Abstract:
Systems are configured for generating, training, and utilizing TTS (text-to-speech) models configured with variance adapter components. The variance adaptor components generate and apply implicit and explicit data for refining and improving the processing of the encoded phoneme data by the acoustic model portion of the TTS models and such that the predicted spectrograms generated by the TTS models are efficiently and accurately created for rendering by vocoders in a desired target language and a target speaker prosody style corresponding to the textual data being processed. The efficiencies and accuracies realized by the variance adapter components can also be further benefited by the altering of the encoding and decoding conformers used by the TTS models, such as by applying the convolution processing prior to the self-attention processing in the encoding/decoding conformer stack (s).
Inventors:
LIU, Yanqing (US)
XU, Zhihang (US)
ZHAO, Sheng (US)
LI, Bohan (US)
TAN, Xu (US)
LI, Runnan (US)
XU, Zhihang (US)
ZHAO, Sheng (US)
LI, Bohan (US)
TAN, Xu (US)
LI, Runnan (US)
Application Number:
PCT/CN2021/117919
Publication Date:
March 16, 2023
Filing Date:
September 13, 2021
Export Citation:
Assignee:
MICROSOFT TECHNOLOGY LICENSING, LLC (US)
LIU, Yanqing (US)
XU, Zhihang (US)
ZHAO, Sheng (US)
LI, Bohan (US)
TAN, Xu (US)
LI, Runnan (US)
LIU, Yanqing (US)
XU, Zhihang (US)
ZHAO, Sheng (US)
LI, Bohan (US)
TAN, Xu (US)
LI, Runnan (US)
International Classes:
G10L13/047; G10L25/30
Attorney, Agent or Firm:
SHANGHAI PATENT & TRADEMARK LAW OFFICE, LLC (CN)
Download PDF: