STATISTICAL-ACOUSTIC-MODEL ADAPTATION METHOD, ACOUSTIC-MODEL LEARNING METHOD SUITABLE FOR STATISTICAL-ACOUSTIC-MODEL ADAPTATION, STORAGE MEDIUM IN WHICH PARAMETERS FOR BUILDING DEEP NEURAL NETWORK ARE STORED, AND COMPUTER PROGRAM FOR ADAPTING STATISTICAL ACOUSTIC MODEL

Title:

Document Type and Number:

WIPO Patent Application WO/2015/079885

Kind Code:

A1

Abstract:

[Problem] To provide a statistical-acoustic-model adaptation method with which learning data having specific conditions can be used to efficiently adapt acoustic models using deep neural networks (DNN), and with which accuracy can also be improved. [Solution] A speaker adaptation method for acoustic models using DNNs includes: a step in which utterance data (90-98) for different speakers is separately stored in first storage devices; a step in which hidden layer modules (112-120) for separate speakers are prepared; a step in which preliminary learning for all the layers (42, 44, 110, 48, 50, 52, 54) in a DNN (80) is performed while switching and selecting the utterance data (90-98), and dynamically substituting a specific layer (110) with the hidden layer modules (112-120) corresponding to the selected utterance data; a step in which the DNN specific layer (110), after the preliminary learning has been completed therefor, is substituted with an initial hidden layer; and a step in which parameters for layers other than the initial hidden layer are fixed, and speech data of a specific speaker is used to perform DNN learning.

Inventors:

MATSUDA SHIGEKI (JP)
LU XUGANG (JP)

Application Number:

PCT/JP2014/079490

Publication Date:

June 04, 2015

Filing Date:

November 06, 2014

Export Citation:

Click for automatic bibliography generation Help

Assignee:

NAT INST INF & COMM TECH (JP)

International Classes:

G10L15/07; G06N3/00; G06N3/08; G10L15/16

Foreign References:

JP2008216488A

2008-09-18

Other References:

FRANK SEIDE ET AL.: "Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription", 2011 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING(ASRU, pages 24 - 29, XP032126095
GEORGE E. DAHL ET AL.: "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition", IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 20, no. 1, January 2012 (2012-01-01), pages 30 - 42, XP011476706
GEOFFREY HINTON ET AL.: "Deep Neural Networks for Acoustic Modeling in Speech Recognition", IEEE SIGNAL PROCESSING MAGAZINE, November 2012 (2012-11-01), pages 82 - 97, XP011469727
Y. BENGIO: "Learning deep architectures for AI", FOUNDATIONS AND TRENDS IN MACHINE LEARNING, vol. 2, no. 1, 2009, pages 1 - 127, XP055013582, DOI: doi:10.1561/2200000006
G. HINTON; L. DENG; D. YU; G. DAHL; A. MOHAMED; N. JAITLY; A. SENIOR; V. VANHOUCKE; P. NGUYEN; T. SAINATH: "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups", IEEE SIGNAL PROCESSING MAGAZINE, vol. 29, no. 6, 2012, pages 82 - 97, XP011469727, DOI: doi:10.1109/MSP.2012.2205597
A. MOHAMED; G. DAHL; G. HINTON: "Acoustic Modeling using Deep Belief Networks", IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 20, no. L, 2012, pages 14 - 22, XP011390317, DOI: doi:10.1109/TASL.2011.2109382
QUOC V. LE; MARC'AURELIO RANZATO; RAJAT MONGA; MATTHIEU DEVIN; KAI CHEN; GREG S. CORRADO; JEFF DEAN ANDREW Y. NG: "Building High-level Features Using Large Scale Unsupervised Learning", PROC. ICML, 2012
H. LIAO: "Speaker adaptation of context dependent deep neural networks", PROC. ICASSP, 2013, pages 7947 - 7951, XP032508263, DOI: doi:10.1109/ICASSP.2013.6639212
See also references of EP 3076389A4

Attorney, Agent or Firm:

SHIMIZU, SATOSHI (JP)
Spring water 敏 (JP)

Download PDF:

View/Download PDF PDF Help

Previous Patent: FLAT CABLE AND PRODUCTION METHOD THEREFOR

Next Patent: METHOD FOR HOISTING MEMBRANE SEPARATION DEVICE