GENERATION METHOD AND APPARATUS FOR MIXED LANGUAGE SPEECH RECOGNITION MODEL

Title:

GENERATION METHOD AND APPARATUS FOR MIXED LANGUAGE SPEECH RECOGNITION MODEL

Document Type and Number:

WIPO Patent Application WO/2023/231576

Kind Code:

A1

Abstract:

A generation method and apparatus for a mixed language speech recognition model, and a method and apparatus for speech recognition by means of said speech recognition model. The generation method for a mixed language speech recognition model comprises: acquiring a training data set comprising an audio sample and a corresponding annotation text (101); by means of a self-supervised learning model, performing feature extraction on each audio data frame in the audio sample to acquire a feature vector corresponding to each audio data frame (102); separately inputting the feature vectors into a language recognition network in and a speech recognition network in an initial mixed language speech recognition model, so as to acquire language probability distribution and word probability distribution corresponding to each audio data frame (103); according to the language probability distribution, the word probability distribution and the annotation text, determining a loss value corresponding to each audio data frame (104); and, on the basis of the loss value corresponding to each audio data frame, separately correcting the language recognition network and the speech recognition network to acquire a mixed language speech recognition model (105).

Inventors:

LI QINGTAO (CN)

Application Number:

PCT/CN2023/087376

Publication Date:

December 07, 2023

Filing Date:

April 10, 2023

Export Citation:

Click for automatic bibliography generation Help

Assignee:

JINGDONG TECH INFORMATION TECH CO LTD (CN)

International Classes:

G10L15/06

Foreign References:

CN115064154A	2022-09-16
CN111816159A	2020-10-23
CN111833844A	2020-10-27
CN113345418A	2021-09-03

Other References:

LIANG-HSUAN TSENG; YU-KUAN FU; HENG-JUI CHANG; HUNG-YI LEE: "Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models", ARXIV.ORG, 7 October 2021 (2021-10-07), XP091074166
ANDROS TJANDRA; DIPTANU GON CHOUDHURY; FRANK ZHANG; KRITIKA SINGH; ALEXIS CONNEAU; ALEXEI BAEVSKI; ASSAF SELA; YATHARTH SARAF; MIC: "Improved Language Identification Through Cross-Lingual Self-Supervised Learning", ARXIV.ORG, 18 October 2021 (2021-10-18), XP091069058

Attorney, Agent or Firm:

TSINGYIHUA INTELLECTUAL PROPERTY LLC (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: TURNOUT, CROSSOVER TURNOUT, AND RAIL TRANSIT SYSTEM

Next Patent: SHIELDED-GATE TRENCH MOSFET AND PREPARATION METHOD THEREFOR