Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
GENERATION METHOD AND APPARATUS FOR MIXED LANGUAGE SPEECH RECOGNITION MODEL
Document Type and Number:
WIPO Patent Application WO/2023/231576
Kind Code:
A1
Abstract:
A generation method and apparatus for a mixed language speech recognition model, and a method and apparatus for speech recognition by means of said speech recognition model. The generation method for a mixed language speech recognition model comprises: acquiring a training data set comprising an audio sample and a corresponding annotation text (101); by means of a self-supervised learning model, performing feature extraction on each audio data frame in the audio sample to acquire a feature vector corresponding to each audio data frame (102); separately inputting the feature vectors into a language recognition network in and a speech recognition network in an initial mixed language speech recognition model, so as to acquire language probability distribution and word probability distribution corresponding to each audio data frame (103); according to the language probability distribution, the word probability distribution and the annotation text, determining a loss value corresponding to each audio data frame (104); and, on the basis of the loss value corresponding to each audio data frame, separately correcting the language recognition network and the speech recognition network to acquire a mixed language speech recognition model (105).

Inventors:
LI QINGTAO (CN)
Application Number:
PCT/CN2023/087376
Publication Date:
December 07, 2023
Filing Date:
April 10, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JINGDONG TECH INFORMATION TECH CO LTD (CN)
International Classes:
G10L15/06
Foreign References:
CN115064154A2022-09-16
CN111816159A2020-10-23
CN111833844A2020-10-27
CN113345418A2021-09-03
Other References:
LIANG-HSUAN TSENG; YU-KUAN FU; HENG-JUI CHANG; HUNG-YI LEE: "Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models", ARXIV.ORG, 7 October 2021 (2021-10-07), XP091074166
ANDROS TJANDRA; DIPTANU GON CHOUDHURY; FRANK ZHANG; KRITIKA SINGH; ALEXIS CONNEAU; ALEXEI BAEVSKI; ASSAF SELA; YATHARTH SARAF; MIC: "Improved Language Identification Through Cross-Lingual Self-Supervised Learning", ARXIV.ORG, 18 October 2021 (2021-10-18), XP091069058
Attorney, Agent or Firm:
TSINGYIHUA INTELLECTUAL PROPERTY LLC (CN)
Download PDF: