ADVANCED CLUSTERING FOR SELF-SUPERVISED LEARNING IN SPEECH RECOGNITION

Title:

ADVANCED CLUSTERING FOR SELF-SUPERVISED LEARNING IN SPEECH RECOGNITION

Document Type and Number:

WIPO Patent Application WO/2023/178583

Kind Code:

A1

Abstract:

Systems and methods are provided for generating a pseudo-labeled training dataset by at least one of: (1) extracting a set of intermediate outputs from an automatic speech recognition model based on applying the automatic speech recognition model to the set of unlabeled speech data, clustering the set of intermediate outputs into different clusters, and generating a first set of pseudo-labels comprising cluster assignments associated with the different clusters and which correspond to the unlabeled speech data, or (2) generating a set of decoded word sequences for the unlabeled speech data by applying the automatic speech recognition model to the set of unlabeled speech data, and generating a second set of pseudo-labels associated with the unlabeled speech data by applying the automatic speech recognition model to both (i) the set of decoded word sequences and (ii) the set of unlabeled speech data.

Inventors:

WANG YIMING (US)
WANG CHENGYI (US)
LI JINYU (US)
WU YU (US)
LIU SHUJIE (US)

Application Number:

PCT/CN2022/082664

Publication Date:

September 28, 2023

Filing Date:

March 24, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MICROSOFT TECHNOLOGY LICENSING LLC (US)
WANG YIMING (US)
WANG CHENGYI (CN)
LI JINYU (US)
WU YU (CN)
LIU SHUJIE (CN)

International Classes:

G10L15/06; G10L15/14; G10L15/16

Other References:

CHENGYI WANG ET AL: "Self-Supervised Learning for speech recognition with Intermediate layer supervision", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 16 December 2021 (2021-12-16), XP091120670
SANYUAN CHEN ET AL: "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 24 January 2022 (2022-01-24), XP091128250
CAI DANWEI ET AL: "An Iterative Framework for Self-Supervised Deep Speaker Representation Learning", ICASSP 2021 - 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 6 June 2021 (2021-06-06), pages 6728 - 6732, XP033955556, DOI: 10.1109/ICASSP39728.2021.9414713
XIAO ALEX ET AL: "Contrastive Semi-Supervised Learning for ASR", ICASSP 2021 - 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 6 June 2021 (2021-06-06), pages 3870 - 3874, XP033954501, DOI: 10.1109/ICASSP39728.2021.9414079

Attorney, Agent or Firm:

SHANGHAI PATENT & TRADEMARK LAW OFFICE, LLC (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: FAST BEAM ALIGNMENT TECHNIQUES

Next Patent: TECHNIQUES FOR SIDELINK RETRANSMISSION OF AN ACCESS LINK TRANSPORT BLOCK