Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
EVALUATION AND MITIGATION OF SOFT-ERRORS IN PARALLEL AND DISTRIBUTED TRAINING AND INFERENCE OF TRANSFORMERS
Document Type and Number:
WIPO Patent Application WO/2024/065794
Kind Code:
A1
Abstract:
The application provides an apparatus, method, and storage medium for evaluation and mitigation of soft-errors in parallel and distributed training and inference of transformers. The apparatus includes two or more processing units (220) capable to communicate with each other and operating collectively as a transformer for deep learning. Each processing unit (220) is configured to perform a matrix multiplication on a first matrix with a first column summation vector added after a last row of the first matrix and a first parameter matrix with a first row summation vector added after a last column of the first parameter matrix, to obtain a second matrix; perform an all-reduce operation on second matrices obtained by the two or more processing units (220) to obtain a third matrix; and determine whether a soft error has occurred by performing a checksum verification on the third matrix.

Inventors:
WANG YAKAI (CN)
WU KEQIANG (CN)
ZHANG JIAN (CN)
Application Number:
PCT/CN2022/123553
Publication Date:
April 04, 2024
Filing Date:
September 30, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTEL CORP (US)
WANG YAKAI (CN)
WU KEQIANG (CN)
ZHANG JIAN (CN)
International Classes:
G06F11/10; G06F9/50; G06F11/07; G06F17/16
Foreign References:
US20180365099A12018-12-20
CN104133738A2014-11-05
CN101414489A2009-04-22
CN108268424A2018-07-10
CN101067972A2007-11-07
US20220057993A12022-02-24
Attorney, Agent or Firm:
BEIJING EAST IP LTD. (CN)
Download PDF: