Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
AUTOMATIC COMPRESSION METHOD AND PLATFORM FOR MULTILEVEL KNOWLEDGE DISTILLATION-BASED PRE-TRAINED LANGUAGE MODEL
Document Type and Number:
WIPO Patent Application WO/2022/126797
Kind Code:
A1
Abstract:
An automatic compression method and platform for a multilevel knowledge distillation-based pre-trained language model, the method comprising the following steps: step I, constructing multilevel knowledge distillation, and distilling the knowledge structure of a large model at three different levels: a self-attention unit, a hidden layer state, and an embedded layer; step II, training a meta-learning knowledge distillation network to generate a universal compression architecture for multiple pre-trained language models; and step III, searching for the best compression structure on the basis of an evolutionary algorithm. First, meta-learning-based knowledge distillation is researched to generate a universal compression architecture for multiple pre-trained language models; second, on the basis of the trained meta-learning network, the best compression structure is searched for by means of the evolutionary algorithm so as to obtain the best universal compression architecture for pre-trained language models not associated with any task.

Inventors:
WANG HONGSHENG (CN)
WANG ENPING (CN)
YU ZAILIANG (CN)
Application Number:
PCT/CN2020/142577
Publication Date:
June 23, 2022
Filing Date:
December 31, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ZHEJIANG LAB (CN)
International Classes:
G06F16/35
Foreign References:
CN111506702A2020-08-07
CN111767711A2020-10-13
CN111062489A2020-04-24
US20200302295A12020-09-24
Other References:
NATURAL LANGUAGE PROCESSING GROUP: "Smaller, Faster and Better Models! Microsoft General-Purpose Compression Method for Pre-Trained Language Models, MiniLM, Helps You Get Twice as Much Done with Half as Much Work", MICROSOFT RESEARCH ASIA - NEWS- FEATURES, 12 May 2020 (2020-05-12), CN, XP009538033, Retrieved from the Internet
WENHUI WANG; FURU WEI; LI DONG; HANGBO BAO; NAN YANG; MING ZHOU: "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 25 February 2020 (2020-02-25), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081607651
Attorney, Agent or Firm:
HANGZHOU QIUSHI PATENT OFFICE CO., LTD (CN)
Download PDF: