Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
VOICE RECOGNITION METHOD AND APPARATUS, AND COMPUTER-READALE STORAGE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2021/057029
Kind Code:
A1
Abstract:
A voice recognition method, comprising: acquiring first linear frequency spectrums corresponding to audio to be trained that has different sampling rates; determining the maximum sampling rate and other sampling rates among the different sampling rates; determining the maximum frequency domain sequence number of the first linear frequency spectrums that correspond to the other sampling rates and the maximum sampling rate, respectively, as a first frequency domain sequence number and a second frequency domain sequence number; in the first linear frequency spectrums corresponding to the other sampling rates, configuring an amplitude value corresponding to each frequency domain sequence number that is greater than the first frequency domain sequence number and less than or equal to the second frequency domain sequence number to be zero so as to obtain a second linear frequency spectrum corresponding to the other sampling rates; determining a first voice feature and a second voice feature according to a first Mel spectrum feature of the first linear frequency spectrum corresponding to the maximum sampling rate and a second Mel spectrum feature of the second linear frequency spectrum corresponding to the other sampling rates, respectively; and using the first voice feature and the second voice feature to train a machine learning model. Further disclosed are a voice recognition apparatus and a computer-readable storage medium.

Inventors:
FU LI (CN)
Application Number:
PCT/CN2020/088229
Publication Date:
April 01, 2021
Filing Date:
April 30, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JINGDONG DIGITS TECH HOLDING CO LTD (CN)
International Classes:
G10L15/06; G10L15/02
Foreign References:
CN110459205A2019-11-15
CN101014997A2007-08-08
CN105513590A2016-04-20
US5475792A1995-12-12
US20080201139A12008-08-21
CN201910904271A2019-09-24
Other References:
JIANQING GAO ET AL.: "Mixed-Bandwidth Cross-Channel Speech Recognition via Joint Optimization of DNN-Based Bandwidth Expansion and Acoustic Modeling", ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 27, no. 3, 31 March 2019 (2019-03-31), XP011695456, ISSN: 2329-9290, DOI: 20200726155044A
MICHAEL L. SELTZER ET AL.: "Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition", TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 15, no. 1,, 31 January 2007 (2007-01-31), XP011151934, ISSN: 1558-7916, DOI: 20200726170342A
See also references of EP 4044175A4
Attorney, Agent or Firm:
CCPIT PATENT AND TRADEMARK LAW OFFICE (CN)
Download PDF: