Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND PROGRAM
Document Type and Number:
WIPO Patent Application WO/2023/182291
Kind Code:
A1
Abstract:
The present invention improves response time for waveform generation and makes it possible to perform detailed processing of a rhythm feature quantity based on overall input before the waveform generation. According to the embodiments, a speech synthesis device comprises an analysis unit, a first processing unit, and a second processing unit. The analysis unit analyzes input text and generates a language feature quantity series that includes at least one vector that represents a language feature quantity. The first processing unit comprises: an encoder that uses a first neural network to convert the language feature quantity series to an intermediate expression series that includes at least one vector that represents a latent variable; and a rhythm feature quantity decoder that uses a second neural network to generate a rhythm feature quantity from the intermediate expression series. The second processing unit comprises a voice waveform decoder that uses a third neural network to sequentially generate a voice waveform from the intermediate expression series and the rhythm feature quantity.

Inventors:
HIRUTA YOSHIKI (JP)
TAMURA MASATSUNE (JP)
Application Number:
PCT/JP2023/010951
Publication Date:
September 28, 2023
Filing Date:
March 20, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TOSHIBA KK (JP)
TOSHIBA DIGITAL SOLUTIONS CORP (JP)
International Classes:
G10L13/10; G10L13/08; G10L25/30
Other References:
REN YI, HU CHENXU, XU TAN, QIN TAO, ZHAO SHENG, ZHOU ZHAO, TIE-YAN LIU: "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech", ARXIV:2006.04558V1, CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 8 June 2020 (2020-06-08), Ithaca, XP093095173, Retrieved from the Internet [retrieved on 20231025], DOI: 10.48550/arxiv.2006.04558
BROOKE STEPHENSON; THOMAS HUEBER; LAURENT GIRIN; LAURENT BESACIER: "Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 15 June 2021 (2021-06-15), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081979275
NAKATA, WATARU ET AL.: "Multi-speaker Audiobook Speech Synthesis using Discrete Character Acting Styles Acquired", IEICE TECHNICAL REPORT, IEICE, JP, vol. 121, no. 282 (SP2021-47), 30 November 2021 (2021-11-30), JP , pages 42 - 47, XP009549661, ISSN: 2432-6380
HIRUTA, YOSHIKI; TAMURA, MASATSUNE: "An investigation on applying pitch-synchronous analysis to Encoder-Decoder speech synthesis", SPRING AND AUTUMN MEETING OF THE ACOUSTICAL SOCIETY OF JAPAN, ACOUSTICAL SOCIETY OF JAPAN, JP, vol. 2022, 31 August 2022 (2022-08-31), JP , pages 1367 - 1368, XP009549498, ISSN: 1880-7658
Attorney, Agent or Firm:
SAKAI INTERNATIONAL PATENT OFFICE (JP)
Download PDF: