SPECTRAL ENVELOPE AND GROUP DELAY INFERENCE SYSTEM AND VOICE SIGNAL SYNTHESIS SYSTEM FOR VOICE ANALYSIS/SYNTHESIS

Title:

SPECTRAL ENVELOPE AND GROUP DELAY INFERENCE SYSTEM AND VOICE SIGNAL SYNTHESIS SYSTEM FOR VOICE ANALYSIS/SYNTHESIS

Document Type and Number:

WIPO Patent Application WO/2014/021318

Kind Code:

A1

Abstract:

Provided are a spectral envelope and group delay inference system and method for voice analysis/synthesis for inferring with high precision and time resolution the spectral envelope and group delay from a voice signal, in order to achieve high-performance analysis and high-quality synthesis of voice (singing voice and speaking voice). A spectral envelope and group delay inference system comprises: a fundamental frequency inference unit (3); an amplitude spectrum acquisition unit (5); a group delay extraction unit (7); a spectral envelope combination unit (9); and a group delay combination unit (11). The spectral envelope combination unit (9) successively finds spectral envelopes for voice synthesis by averaging superimposed spectra. The group delay combination unit selects from a plurality of group delays a group delay corresponding to the maximum envelope of each frequency component of a spectral envelope, and successively finds a group delay for voice synthesis by combining a plurality of group delays that have thus been selected.

Inventors:

NAKANO TOMOYASU
GOTO MASATAKA

Application Number:

PCT/JP2013/070609

Publication Date:

February 06, 2014

Filing Date:

July 30, 2013

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NAT INST OF ADVANCED IND SCIEN (JP)

International Classes:

G10L13/06; G10L25/78; G10L25/03; G10L25/15

Foreign References:

JPH1097287A	1998-04-14
JP2001249674A	2001-09-14
JPH11219200A	1999-08-10
JPH09179586A	1997-07-11
JPH1097287A	1998-04-14

Other References:

HIDEKI KAWAHARA ET AL.: "Source Information Representations for Synthetic Speech : Group Delay, Event and Harmonic Structures", IEICE TECHNICAL REPORT, vol. 101, no. 87, 18 May 2001 (2001-05-18), pages 9 - 16, XP008176526
HIDEKI KAWAHARA ET AL.: "Vocal fold closure and speech event detection using group delay", IEICE TECHNICAL REPORT, vol. 99, no. 679, 10 March 2000 (2000-03-10), pages 33 - 40, XP055192707
HIDEKI BANNO ET AL.: "Efficient Representation of Short-Time Phase Based on Time-Domain Smoothed Group Delay", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J84-D-II, no. 4, 1 April 2001 (2001-04-01), pages 621 - 628, XP055192710
TOMOYASU NAKANO ET AL.: "Kasei Onsei Bunseki Gosei no Tameno FO Tekio Taju Frame Togo Bunseki ni Motozuku Spectrum Horaku to Gunchien no Suiteiho", IPSJ SIG NOTES, vol. 2012-MUS, no. 7, 9 August 2012 (2012-08-09), pages 1 - 9, XP055193196
ZOLZER, U.; AMATRIAIN, X.: "DAFX - Digital Audio Effect", 2002, WILEY
ITO, M.; YANO, M.: "Perceptual Naturalness of Time-Scale Modified Speech", IEICE (THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEER) TECHNICAL REPORT EA, 2008, pages 13 - 18
MATSUBARA, T.; MORISE, M.; NISHIURA, T: "Perceptual Effect of Phase Characteristics of the Voiced Sound in High-Quality Speech Synthesis", ACOUSTICAL SOCIETY OF JAPAN, TECHNICAL COMMITTEE OF PSYCHOLOGICAL AND PHYSIOLOGICAL ACOUSTICS PAPERS, vol. 40, no. 8, 2010, pages 653 - 658
HAMAGAMI, T.: "Speech Synthesis Using Source Wave Shape Modification Technique by Harmonic Phase Control", ACOUSTICAL SOCIETY OF JAPAN, JOURNAL, vol. 54, no. 9, 1998, pages 623 - 631
FLANAGAN, J.; GOLDEN, R., PHASE VOCODER, BELL SYSTEM TECHNICAL JOURNAL, vol. 45, 1966, pages 1493 - 1509
GRIFFIN, D. W.: "Multi-Band Excitation Vocoder, Technical report (Massachusetts Institute of Technology", RESEARCH LABORATORY OF ELECTRONICS, 1987
ITAKURA, F.; SAITO, S.: "Analysis Synthesis Telephony based on the Maximum Likelihood Method", REPORTS OF THE 6TH INT. CONG. ON ACOUST., vol. 2, no. C-5-5, 1968, pages C17 - 20, XP000646433
ATAL, B. S.; HANAUER, S.: "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave", J. ACOUST. SOC. AM., vol. 50, no. 4, 1971, pages 637 - 655, XP002019898, DOI: doi:10.1121/1.1912679
TOKUDA, K.; KOBAYASHI, T.; MASUKO, T.; IMAI, S.: "Melgeneralized Cepstral Analysis - A Unified Approach to Speech Spectral Estimation", PROC. ICSLP1994, 1994, pages 1043 - 1045
IMAI, S.; ABE, Y.: "Spectral Envelope Extraction by Improved Cepstral Method", IEICE, JOURNAL, vol. J62-A, no. 4, 1979, pages 217 - 223
ROBEL, A.; RODET, X.: "Efficient Spectral Envelope Estimation and Its Application to Pitch Shifting and Envelope Preservation", PROC. DAFX2005, 2005, pages 30 - 35
VILLAVICENCIO, F.; ROBEL, A.; RODET, X.: "Extending Efficient Spectral Envelope Modeling to Mel-frequency Based Representation", PROC. ICASSP2008, 2008, pages 1625 - 1628, XP031250879
VILLAVICENCIO, F.; ROBEL, A.; RODET, X.: "Improving LPC Spectral Envelope Extraction of Voiced Speech by True-Envelope Estimation", PROC. ICASSP2006, 2006, pages 869 - 872
MOULINES, E.; CHARPENTIER, F.: "Pitch-synchronous Waveform Processing Techniques for Text-to-speech Synthesis Using Diphones", SPEECH COMMUNICATION, vol. 9, no. 5-6, 1990, pages 453 - 467, XP024228778, DOI: doi:10.1016/0167-6393(90)90021-Z
MCAULAY, R.; T. QUATIERI: "Speech Analysis/Synthesis Based on A Sinusoidal Representation", IEEE TRANS. ASSP, vol. 34, no. 4, 1986, pages 744 - 755
SMITH, J.; SERRA, X.: "PARSHL: An Analysis/Synthesis Program for Non-harmonic Sounds Based on A Sinusoidal Representation", PROC. ICMC 1987, 1987, pages 290 - 297, XP009130237
SERRA, X.; SMITH, J.: "Spectral Modeling Synthesis: A Sound Analysis/Synthesis Based on A Deterministic Plus Stochastic Decomposition", COMPUTER MUSIC JOURNAL, vol. 14, no. 4, 1990, pages 12 - 24, XP009122116, DOI: doi:10.2307/3680788
STYLIANOU, Y., HARMONIC PLUS NOISE MODELS FOR SPEECH, COMBINED WITH STATISTICAL METHODS, FOR SPEECH AND SPEAKER MODIFICATION
DEPALLE, P.; H'ELIE, T.: "Extraction of Spectral Peak Parameters Using a Short-time Fourier Transform Modeling and No Sidelobe Windows", PROC. WASPAA1997, 1997
GEORGE, E.; SMITH, M.: "Analysis-by-Synthesis/Overlap-Add Sinusoidal Modeling Applied to The Analysis and Synthesis of Musical Tones", JOURNAL OF THE AUDIO ENGINEERING SOCIETY, vol. 40, no. 6, 1992, pages 497 - 515, XP001161167
PANTAZIS, Y.; ROSEC, O.; STYLIANOU, Y.: "Iterative Estimation of Sinusoidal Signal Parameters", IEEE SIGNAL PROCESSING LETTERS, vol. 17, no. 5, 2010, pages 461 - 464, XP011302693
ABE, M.; SMITH III, J. O.: "Design Criteria for Simple Sinusoidal Parameter Estimation based on Quadratic Interpolation of FFT Magnitude Peaks", PROC. AES 117TH CONVENTION, 2004
BONADA, J.: "Wide-Band Harmonic Sinusoidal Modeling", PROC. DAFX-08, 2008, pages 265 - 272, XP002503758
ITO, M.; YANO, M.: "Sinusoidal Modeling for Nonstationary Voiced Speech based on a Local Vector Transform", J. ACOUST. SOC. AM., vol. 121, no. 3, 2007, pages 1717 - 1727, XP012096491, DOI: doi:10.1121/1.2431581
PAVLOVETS, A.; PETROVSKY, A.: "Robust HNR-based Closed-loop Pitch and Harmonic Parameters Estimation", PROC. INTERSPEECH2011, 2011, pages 1981 - 1984
KAMEOKA, H.; ONO, N.; SAGAYAMA, S.: "Auxiliary Function Approach to Parameter Estimation of Constrained Sinusoidal Model for Monaural Speech Separation", PROC. ICASSP 2008, 2008, pages 29 - 32, XP031250480
KAWAHARA, H.; MASUDA-KATSUSE, I.; DE CHEVEIGNE, A.: "Restructuring Speech Representations Using a Pitch Adaptive Time-frequency Smoothing and an Instantaneous Frequency Based on F0 Extraction: Possible Role of a Repetitive Structure in Sounds", SPEECH COMMUNICATION, vol. 27, 1999, pages 187 - 207, XP004163250, DOI: doi:10.1016/S0167-6393(98)00085-5
KAWAHARA, H.; MORISE, M.; TAKAHASHI, T.; NISHIMURA, R.; IRINO, T.; BANNO, H.: "Tandem-STRAIGHT: A Temporally Stable Power Spectral Representation for Periodic Signals and Applications to Interference-free Spectrum, FO, and Aperiodicity Estimation", PROC. OF ICASSP 2008, 2008, pages 3933 - 3936, XP031251456
AKAGIRI, H.; MORISE M.; IRINO, T.; KAWAHARA, H.: "Evaluation and Optimization of FO-Adaptive Spectral Envelope Extraction Based on Spectral Smoothing with Peak Emphasis", IEICE, JOURNAL, vol. J94-A, no. 8, 2011, pages 557 - 567
MORISE, M.; MATSUBARA, T.; NAKANO, K.; NISHIURA N.: "A Rapid Spectrum Envelope Estimation Technique of Vowel for High-Quality Speech Synthesis", IEICE, JOURNAL, vol. J94-D, no. 7, 2011, pages 1079 - 1087
MORISE, M.: PLATINUM: "A Method to Extract Excitation Signals for Voice Synthesis System", ACOUST. SCI. & TECH., vol. 33, no. 2, 2012, pages 123 - 125
BANNNO, H.; JINLIN, L.; NAKAMURA, S.; SHIKANO, K.; KAWAHARA, H.: "Efficient Representation of Short-Time Phase Based on Time-Domain Smoothed Group Delay", IEICE, JOURNAL, vol. J84-D-II, no. 4, 2001, pages 621 - 628
BANNNO, H.; JINLIN, L.; NAKAMURA, S.; SHIKANO, K.; KAWAHARA, H.: "Speech Manipulation Method Using Phase Manipulation Based on Time-Domain Smoothed Group Delay", IEICE, JOURNAL, vol. J83-D-II, no. 11, 2000, pages 2276 - 2282
ZOLFAGHARI, P.; WATANABE, S.; NAKAMURA, A.; KATAGIRI, S.: "Modelling of the Speech Spectrum Using Mixture of Gaussians", PROC. ICASSP 2004, 2004, pages 553 - 556, XP010717688, DOI: doi:10.1109/ICASSP.2004.1326045
KAMEOKA, H.; ONO, N.; SAGAYAMA, S., SPEECH SPECTRUM MODELING FOR JOINT ESTIMATION OF SPECTRAL ENVELOPE AND FUNDAMENTAL FREQUENCY, vol. 18, no. 6, 2006, pages 2502 - 2505
AKAMINE, M.; KAGOSHIMA, T.: "Analytic Generation of Synthesis Units by Closed Loop Training for Totally Speaker Driven Text to Tpeech System (TOS Drive TTS", PROC. ICSLP1998, 1998, pages 1927 - 1930
SHIGA, Y.; KING, S.: "Estimating the Spectral Envelope of Voiced Speech Using Multi-frame Analysis", PROC. EUROSPEECH2003, 2003, pages 1737 - 1740
TODA, T.; TOKUDA, K.: "Statistical Approach to Vocal Tract Transfer Function Estimation Based on Factor Analyzed Trajectory HMM", PROC. ICASSP2008, 2008, pages 3925 - 3928, XP031251454
FUJIHARA, H.; GOTO, M.; OKUNO, H. G.: "A Novel Framework for Recognizing Phonemes of Singing Voice in Polyphonic Music", PROC. WASPAA2009, 2009, pages 17 - 20, XP031575122
GOTO, M.; HASHIGUCHI, H.; NISHIMURA, T.; OKA, R.: "RWC Music Database for Experiments: Music and Instrument Sound Database", INFORMATION PROCESSING SOCIETY OF JAPAN (IPS) JOURNAL, vol. 45, no. 3, 2014, pages 728 - 738
GOTO, M.; NISHIMURA, T.: "AIST Hamming Database, Music Database for Singing Research", IPS REPORT, 2005-MUS-61, 2005, pages 7 - 12
KLATT, D.H.: "Software for A Cascade/parallel Formant Synthesizer", J. ACOUST. SOC. AM., vol. 67, 1980, pages 971 - 995

Attorney, Agent or Firm:

NISHIURA Tsuguharu (JP)
Tsugiharu Nishiura (JP)

Download PDF:

View/Download PDF PDF Help

Previous Patent: POLYISOCYANATE COMPOSITION, SOLAR CELL MEMBER COVERING MATERIAL, SOLAR CELL MEMBER WITH COVERING LAY...

Next Patent: POLYAMIDE ACID RESIN COMPOSITION, POLYIMIDE FILM USING SAME, AND METHOD FOR PRODUCING SAID POLYIMIDE...