SPEECH SYNTHESIS METHOD AND SYSTEM

Title:

SPEECH SYNTHESIS METHOD AND SYSTEM

Document Type and Number:

WIPO Patent Application WO/2024/058573

Kind Code:

A1

Abstract:

The present disclosure relates to a speech synthesis method performed by means of at least one processor. The speech synthesis method comprises the steps of: receiving an input text; generating a text representation from the input text by using a text encoder; generating a self-supervised representation including linguistic information from the text representation by using a self-supervised representation generator; generating an acoustic feature on the basis of the self-supervised representation by using an acoustic feature generator; and generating synthetic speech on the basis of the acoustic feature by using a speech generator.

Inventors:

SONG EUNWOO (KR)
OH SUHYEON (KR)
LEE SANG-HOON (KR)
LEE SEONG-WHAN (KR)

Application Number:

PCT/KR2023/013832

Publication Date:

March 21, 2024

Filing Date:

September 14, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NAVER CORP (KR)
UNIV KOREA RES & BUS FOUND (KR)

International Classes:

G10L13/08; G06N20/00; G10L13/027; G10L13/06

Foreign References:

KR20220083987A

2022-06-21

Other References:

CHENPENG DU; YIWEI GUO; XIE CHEN; KAI YU: "VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 May 2022 (2022-05-11), 201 Olin Library Cornell University Ithaca, NY 14853, XP091218415
HYEONG-SEOK CHOI; JUHEON LEE; WANSOO KIM; JIE HWAN LEE; HOON HEO; KYOGU LEE: "Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 27 October 2021 (2021-10-27), 201 Olin Library Cornell University Ithaca, NY 14853, XP091081743
ZHEHUAI CHEN; YU ZHANG; ANDREW ROSENBERG; BHUVANA RAMABHADRAN; GARY WANG; PEDRO MORENO: "Injecting Text in Self-Supervised Speech Pretraining", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 27 August 2021 (2021-08-27), 201 Olin Library Cornell University Ithaca, NY 14853, XP091039368
YIHAN WU; XI WANG; SHAOFEI ZHANG; LEI HE; RUIHUA SONG; JIAN-YUN NIE: "Self-supervised Context-aware Style Representation for Expressive Speech Synthesis", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 June 2022 (2022-06-25), 201 Olin Library Cornell University Ithaca, NY 14853, XP091257437

Attorney, Agent or Firm:

KIM, Han Sol et al. (KR)

Download PDF:

View/Download PDF PDF Help

Previous Patent: MULTI-BIT ACCUMULATOR AND IN-MEMORY COMPUTING PROCESSOR WITH SAME

Next Patent: SERVING ROBOT AND CONTROL METHOD THEREOF