VOICE DATA ENHANCING METHOD AND DEVICE IN VOICE RECOGNITION BASED ON RECURRENT NEURAL NETWORK

Title:

VOICE DATA ENHANCING METHOD AND DEVICE IN VOICE RECOGNITION BASED ON RECURRENT NEURAL NETWORK

Document Type and Number:

WIPO Patent Application WO/2019/024008

Kind Code:

A1

Abstract:

A voice data enhancing method based on a recurrent neural network in the field of voice recognition processing aims at solving the problem of excessive modeling word dependence caused by irregular grammar phenomena of voice recognition simulation in voice recognition in a recurrent neural network. The method comprises: extracting acoustic features of various frequency energy values identifying voice from input voice data to generate acoustic feature vectors (201); obtaining a statement label sequence of the voice data according to a preset labeling file and the acoustic feature vectors (202); obtaining an alignment file after a decision cluster operation by means of the labeling file preset by a decision cluster, and the statement label sequence (203); generating a first random number γ between [0, 1], and comparing the first random number with a preset adjusting proportion α (204); and if the first random number γ is greater than the adjusting proportion α, performing enhancement processing on the voice data in a position indicated by a boundary file (205). The method enables irregular spoken language phenomena in training data to be increased quickly and conveniently.

Inventors:

ZHAO YUANYUAN (CN)
XU SHUANG (CN)
XU BO (CN)

Application Number:

PCT/CN2017/095668

Publication Date:

February 07, 2019

Filing Date:

August 02, 2017

Export Citation:

Click for automatic bibliography generation Help

Assignee:

INST AUTOMATION CAS (CN)

International Classes:

G10L15/16; G10L15/02; G10L15/06; G10L15/20

Foreign References:

CN107437417A	2017-12-05
CN101582264A	2009-11-18
CN103021420A	2013-04-03
US9208794B1	2015-12-08

Other References:

KUMAR, A. ET AL.: "Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks", INTERSPEECH 2016, 12 September 2016 (2016-09-12), pages 3738 - 3742, XP055567333
XU, YONG ET AL.: "A Regression Approach to Speech Enhancement Based on Deep Neural Networks", ACM TRANSACTIONS ON AUDIO, SPEECH, vol. 23, no. 1, 31 January 2015 (2015-01-31), pages 7 - 19, XP055567335
ZHAO, YUANYUAN ET AL.: "Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling", INTERSPEECH 2016, 12 September 2016 (2016-09-12), pages 3419 - 3423

Attorney, Agent or Firm:

HANRAY LAW FIRM (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: PHOTOGRAPHING INTERFACE PREVIEW SWITCHING METHOD AND SWITCHING APPARATUS FOR SMART DEVICE

Next Patent: LIFTABLE HINGE AND SHOWER DOOR