Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
STATISTICAL-ACOUSTIC-MODEL ADAPTATION METHOD, ACOUSTIC-MODEL LEARNING METHOD SUITABLE FOR STATISTICAL-ACOUSTIC-MODEL ADAPTATION, STORAGE MEDIUM IN WHICH PARAMETERS FOR BUILDING DEEP NEURAL NETWORK ARE STORED, AND COMPUTER PROGRAM FOR ADAPTING STATISTICAL ACOUSTIC MODEL
Document Type and Number:
WIPO Patent Application WO/2015/079885
Kind Code:
A1
Abstract:
[Problem] To provide a statistical-acoustic-model adaptation method with which learning data having specific conditions can be used to efficiently adapt acoustic models using deep neural networks (DNN), and with which accuracy can also be improved. [Solution] A speaker adaptation method for acoustic models using DNNs includes: a step in which utterance data (90-98) for different speakers is separately stored in first storage devices; a step in which hidden layer modules (112-120) for separate speakers are prepared; a step in which preliminary learning for all the layers (42, 44, 110, 48, 50, 52, 54) in a DNN (80) is performed while switching and selecting the utterance data (90-98), and dynamically substituting a specific layer (110) with the hidden layer modules (112-120) corresponding to the selected utterance data; a step in which the DNN specific layer (110), after the preliminary learning has been completed therefor, is substituted with an initial hidden layer; and a step in which parameters for layers other than the initial hidden layer are fixed, and speech data of a specific speaker is used to perform DNN learning.

Inventors:
MATSUDA SHIGEKI (JP)
LU XUGANG (JP)
Application Number:
PCT/JP2014/079490
Publication Date:
June 04, 2015
Filing Date:
November 06, 2014
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NAT INST INF & COMM TECH (JP)
International Classes:
G10L15/07; G06N3/00; G06N3/08; G10L15/16
Foreign References:
JP2008216488A2008-09-18
Other References:
FRANK SEIDE ET AL.: "Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription", 2011 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING(ASRU, pages 24 - 29, XP032126095
GEORGE E. DAHL ET AL.: "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition", IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 20, no. 1, January 2012 (2012-01-01), pages 30 - 42, XP011476706
GEOFFREY HINTON ET AL.: "Deep Neural Networks for Acoustic Modeling in Speech Recognition", IEEE SIGNAL PROCESSING MAGAZINE, November 2012 (2012-11-01), pages 82 - 97, XP011469727
Y. BENGIO: "Learning deep architectures for AI", FOUNDATIONS AND TRENDS IN MACHINE LEARNING, vol. 2, no. 1, 2009, pages 1 - 127, XP055013582, DOI: doi:10.1561/2200000006
G. HINTON; L. DENG; D. YU; G. DAHL; A. MOHAMED; N. JAITLY; A. SENIOR; V. VANHOUCKE; P. NGUYEN; T. SAINATH: "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups", IEEE SIGNAL PROCESSING MAGAZINE, vol. 29, no. 6, 2012, pages 82 - 97, XP011469727, DOI: doi:10.1109/MSP.2012.2205597
A. MOHAMED; G. DAHL; G. HINTON: "Acoustic Modeling using Deep Belief Networks", IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 20, no. L, 2012, pages 14 - 22, XP011390317, DOI: doi:10.1109/TASL.2011.2109382
QUOC V. LE; MARC'AURELIO RANZATO; RAJAT MONGA; MATTHIEU DEVIN; KAI CHEN; GREG S. CORRADO; JEFF DEAN ANDREW Y. NG: "Building High-level Features Using Large Scale Unsupervised Learning", PROC. ICML, 2012
H. LIAO: "Speaker adaptation of context dependent deep neural networks", PROC. ICASSP, 2013, pages 7947 - 7951, XP032508263, DOI: doi:10.1109/ICASSP.2013.6639212
See also references of EP 3076389A4
Attorney, Agent or Firm:
SHIMIZU, SATOSHI (JP)
Spring water 敏 (JP)
Download PDF: