SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND PROGRAM

Title:

SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND PROGRAM

Document Type and Number:

WIPO Patent Application WO/2023/182291

Kind Code:

A1

Abstract:

The present invention improves response time for waveform generation and makes it possible to perform detailed processing of a rhythm feature quantity based on overall input before the waveform generation. According to the embodiments, a speech synthesis device comprises an analysis unit, a first processing unit, and a second processing unit. The analysis unit analyzes input text and generates a language feature quantity series that includes at least one vector that represents a language feature quantity. The first processing unit comprises: an encoder that uses a first neural network to convert the language feature quantity series to an intermediate expression series that includes at least one vector that represents a latent variable; and a rhythm feature quantity decoder that uses a second neural network to generate a rhythm feature quantity from the intermediate expression series. The second processing unit comprises a voice waveform decoder that uses a third neural network to sequentially generate a voice waveform from the intermediate expression series and the rhythm feature quantity.

Inventors:

HIRUTA YOSHIKI (JP)
TAMURA MASATSUNE (JP)

Application Number:

PCT/JP2023/010951

Publication Date:

September 28, 2023

Filing Date:

March 20, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

TOSHIBA KK (JP)
TOSHIBA DIGITAL SOLUTIONS CORP (JP)

International Classes:

G10L13/10; G10L13/08; G10L25/30

Other References:

REN YI, HU CHENXU, XU TAN, QIN TAO, ZHAO SHENG, ZHOU ZHAO, TIE-YAN LIU: "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech", ARXIV:2006.04558V1, CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 8 June 2020 (2020-06-08), Ithaca, XP093095173, Retrieved from the Internet [retrieved on 20231025], DOI: 10.48550/arxiv.2006.04558
BROOKE STEPHENSON; THOMAS HUEBER; LAURENT GIRIN; LAURENT BESACIER: "Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 15 June 2021 (2021-06-15), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081979275
NAKATA, WATARU ET AL.: "Multi-speaker Audiobook Speech Synthesis using Discrete Character Acting Styles Acquired", IEICE TECHNICAL REPORT, IEICE, JP, vol. 121, no. 282 (SP2021-47), 30 November 2021 (2021-11-30), JP , pages 42 - 47, XP009549661, ISSN: 2432-6380
HIRUTA, YOSHIKI; TAMURA, MASATSUNE: "An investigation on applying pitch-synchronous analysis to Encoder-Decoder speech synthesis", SPRING AND AUTUMN MEETING OF THE ACOUSTICAL SOCIETY OF JAPAN, ACOUSTICAL SOCIETY OF JAPAN, JP, vol. 2022, 31 August 2022 (2022-08-31), JP , pages 1367 - 1368, XP009549498, ISSN: 1880-7658

Attorney, Agent or Firm:

SAKAI INTERNATIONAL PATENT OFFICE (JP)

Download PDF:

View/Download PDF PDF Help

Previous Patent: PARALLAX INFORMATION GENERATION DEVICE, PARALLAX INFORMATION GENERATION METHOD, AND PARALLAX INFORMA...

Next Patent: PORTLAND CEMENT CLINKER, CEMENT COMPOSITION, AND METHOD FOR PRODUCING PORTLAND CEMENT CLINKER