VOICE RECOGNITION METHOD AND APPARATUS, AND COMPUTER-READALE STORAGE MEDIUM

Title:

VOICE RECOGNITION METHOD AND APPARATUS, AND COMPUTER-READALE STORAGE MEDIUM

Document Type and Number:

WIPO Patent Application WO/2021/057029

Kind Code:

A1

Abstract:

A voice recognition method, comprising: acquiring first linear frequency spectrums corresponding to audio to be trained that has different sampling rates; determining the maximum sampling rate and other sampling rates among the different sampling rates; determining the maximum frequency domain sequence number of the first linear frequency spectrums that correspond to the other sampling rates and the maximum sampling rate, respectively, as a first frequency domain sequence number and a second frequency domain sequence number; in the first linear frequency spectrums corresponding to the other sampling rates, configuring an amplitude value corresponding to each frequency domain sequence number that is greater than the first frequency domain sequence number and less than or equal to the second frequency domain sequence number to be zero so as to obtain a second linear frequency spectrum corresponding to the other sampling rates; determining a first voice feature and a second voice feature according to a first Mel spectrum feature of the first linear frequency spectrum corresponding to the maximum sampling rate and a second Mel spectrum feature of the second linear frequency spectrum corresponding to the other sampling rates, respectively; and using the first voice feature and the second voice feature to train a machine learning model. Further disclosed are a voice recognition apparatus and a computer-readable storage medium.

More Like This:

JP7159475	Voice control method, cloud server and terminal device
JPS63183498	REGISTRATION TYPE VOICE INPUT/OUTPUT DEVICE
JP2021503104	Automatic speech recognition device and method

Inventors:

FU LI (CN)

Application Number:

PCT/CN2020/088229

Publication Date:

April 01, 2021

Filing Date:

April 30, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

JINGDONG DIGITS TECH HOLDING CO LTD (CN)

International Classes:

G10L15/06; G10L15/02

Foreign References:

CN110459205A	2019-11-15
CN101014997A	2007-08-08
CN105513590A	2016-04-20
US5475792A	1995-12-12
US20080201139A1	2008-08-21
CN201910904271A	2019-09-24

Other References:

JIANQING GAO ET AL.: "Mixed-Bandwidth Cross-Channel Speech Recognition via Joint Optimization of DNN-Based Bandwidth Expansion and Acoustic Modeling", ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 27, no. 3, 31 March 2019 (2019-03-31), XP011695456, ISSN: 2329-9290, DOI: 20200726155044A
MICHAEL L. SELTZER ET AL.: "Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition", TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 15, no. 1,, 31 January 2007 (2007-01-31), XP011151934, ISSN: 1558-7916, DOI: 20200726170342A
See also references of EP 4044175A4

Attorney, Agent or Firm:

CCPIT PATENT AND TRADEMARK LAW OFFICE (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: SHIELDING METHOD AND SYSTEM CAPABLE OF ACHIEVING SYNCHRONOUS TIMING

Next Patent: VEHICLE-MOUNTED NAVIGATION TRACEABILITY MONITORING SYSTEM