Title:
MIXTURE-OF-EXPERTS MODEL IMPLEMENTATION METHOD AND SYSTEM, ELECTRONIC DEVICE, AND STORAGE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2023/201981
Kind Code:
A1
Abstract:
The present disclosure relates to fields of artificial intelligence such as deep learning and distributed storage, and provided thereby are a mixture-of-experts model implementation method and system, an electronic device, and a storage medium. The method may comprise: constructing a communication group, the communication group comprising a tensor parallel communication group, the tensor parallel communication group comprising at least two computing devices, and sparse parameters of computing devices in the same tensor parallel communication group using a tensor parallel segmentation means; and on the basis of the communication group, training a mixture-of-experts model. By applying the solution of the present disclosure, normal proceeding of model training can be guaranteed.
Inventors:
SHEN LIANG (CN)
WANG HAIFENG (CN)
WU HUACHAO (CN)
GONG WEIBAO (CN)
WU ZHIHUA (CN)
YU DIANHAI (CN)
WANG HAIFENG (CN)
WU HUACHAO (CN)
GONG WEIBAO (CN)
WU ZHIHUA (CN)
YU DIANHAI (CN)
Application Number:
PCT/CN2022/119752
Publication Date:
October 26, 2023
Filing Date:
September 20, 2022
Export Citation:
Assignee:
BEIJING BAIDU NETCOM SCI & TECH CO LTD (CN)
International Classes:
G06N3/04
Foreign References:
CN114841315A | 2022-08-02 | |||
CN114282681A | 2022-04-05 | |||
CN114186633A | 2022-03-15 | |||
CN114169427A | 2022-03-11 | |||
US20190251423A1 | 2019-08-15 | |||
US20200151580A1 | 2020-05-14 | |||
CN202210430519A | 2022-04-22 |
Attorney, Agent or Firm:
BEIJING WISPRO INTELLECTUAL PROPERTY LLP. (CN)
Download PDF: