SINGLE-SOUND CHANNEL ROBUSTNESS SPEECH KEYWORD REAL-TIME DETECTION METHOD

Title:

SINGLE-SOUND CHANNEL ROBUSTNESS SPEECH KEYWORD REAL-TIME DETECTION METHOD

Document Type and Number:

WIPO Patent Application WO/2021/062705

Kind Code:

A1

Abstract:

A single-sound channel robustness speech keyword real-time detection method, comprising the following steps: receiving noisy speech of an electronic format; converting a time domain speech signal into a frequency domain signal by means of short-time Fourier transform in a frame-by-frame mode; using a Mel filter to process the frequency domain signal so as to obtain a Mel feature as an acoustic feature; making the Mel feature pass a neural network in a frame-by-frame mode, and then using a normalized exponential function to process the Mel feature to obtain the confidence degree information of each keyword; when the confidence degree information of a certain keyword is greater than a predefined threshold, splicing the current frame and previous several frames so as to be used as an output of the neural network; and sequentially passing through an attention mechanism and a feed-forward type deep neural network, and performing processing by means of the normalized exponential function so as to obtain the confidence degree information of each sentence-level keyword, when a confidence degree value is greater than the predefined threshold, considering that the keyword is detected, and otherwise, considering the keyword is not detected. The method still can keep a high wakeup rate in a noisy environment, has wide applicability, and can greatly reduce the false alarm rate of the neural network and improve the detection performance of the keyword.

Inventors:

HU PENG (CN)
YAN YONGJIE (CN)

Application Number:

PCT/CN2019/109603

Publication Date:

April 08, 2021

Filing Date:

September 30, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

ELEVOC TECH CO LTD (CN)

International Classes:

G10L15/02; G10L15/14

Foreign References:

CN110097870A	2019-08-06
CN108615526A	2018-10-02
CN103559881A	2014-02-05
US20060190259A1	2006-08-24

Other References:

KUMAR RAJATH, YERUVA VAISHNAVI, GANAPATHY SRIRAM: "On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification", INTERSPEECH 2018, ISCA, ISCA, 1 January 2018 (2018-01-01), ISCA, pages 1121 - 1125, XP055797254, DOI: 10.21437/Interspeech.2018-1759

Attorney, Agent or Firm:

SHENZHEN KUAIMA PATENT & TRADEMARK OFFICE et al. (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: METHOD AND APPARATUS FOR ACQUIRING SIDELINK RESOURCE

Next Patent: REAL-TIME VOICE NOISE REDUCTION METHOD FOR DUAL-MICROPHONE MOBILE TELEPHONE IN NEAR-DISTANCE CONVERS...