AUTOMATIC COMPRESSION METHOD AND PLATFORM FOR MULTILEVEL KNOWLEDGE DISTILLATION-BASED PRE-TRAINED LANGUAGE MODEL

Title:

AUTOMATIC COMPRESSION METHOD AND PLATFORM FOR MULTILEVEL KNOWLEDGE DISTILLATION-BASED PRE-TRAINED LANGUAGE MODEL

Document Type and Number:

WIPO Patent Application WO/2022/126797

Kind Code:

A1

Abstract:

An automatic compression method and platform for a multilevel knowledge distillation-based pre-trained language model, the method comprising the following steps: step I, constructing multilevel knowledge distillation, and distilling the knowledge structure of a large model at three different levels: a self-attention unit, a hidden layer state, and an embedded layer; step II, training a meta-learning knowledge distillation network to generate a universal compression architecture for multiple pre-trained language models; and step III, searching for the best compression structure on the basis of an evolutionary algorithm. First, meta-learning-based knowledge distillation is researched to generate a universal compression architecture for multiple pre-trained language models; second, on the basis of the trained meta-learning network, the best compression structure is searched for by means of the evolutionary algorithm so as to obtain the best universal compression architecture for pre-trained language models not associated with any task.

Inventors:

WANG HONGSHENG (CN)
WANG ENPING (CN)
YU ZAILIANG (CN)

Application Number:

PCT/CN2020/142577

Publication Date:

June 23, 2022

Filing Date:

December 31, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

ZHEJIANG LAB (CN)

International Classes:

G06F16/35

Foreign References:

CN111506702A	2020-08-07
CN111767711A	2020-10-13
CN111062489A	2020-04-24
US20200302295A1	2020-09-24

Other References:

NATURAL LANGUAGE PROCESSING GROUP: "Smaller, Faster and Better Models! Microsoft General-Purpose Compression Method for Pre-Trained Language Models, MiniLM, Helps You Get Twice as Much Done with Half as Much Work", MICROSOFT RESEARCH ASIA - NEWS- FEATURES, 12 May 2020 (2020-05-12), CN, XP009538033, Retrieved from the Internet
WENHUI WANG; FURU WEI; LI DONG; HANGBO BAO; NAN YANG; MING ZHOU: "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 February 2020 (2020-02-25), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081607651

Attorney, Agent or Firm:

HANGZHOU QIUSHI PATENT OFFICE CO., LTD (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: MULTIFUNCTIONAL GAS CHROMATOGRAPHY-MASS SPECTROMETRY ANALYSIS APPARATUS AND METHOD

Next Patent: FLEXIBLE SLIDING POWER SUPPLY ASSEMBLY, MASSAGE CUSHION, AND CONTROL METHOD THEREFOR