Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
APPARATUS AND METHOD FOR SHARING AND PRUNING WEIGHTS FOR VISION AND LANGUAGE MODELS
Document Type and Number:
WIPO Patent Application WO/2024/072001
Kind Code:
A1
Abstract:
A method of performing a multimodal tasks by using a multimodal model that includes a text encoder and a vision encoder, may include obtaining a text feature from the query via the text encoder, obtaining an image feature from the one or more input images via the vision encoder, and outputting a response to the query based on similarity between the text feature and the image feature, wherein weights vectors of the text encoder and the vision encoder are pruned and shared according to a sharing vector and a pruning vector that are generated by a hypernetwork, and wherein the hypernetwork and the multimodal model are jointly trained to minimize at least one of a difference between the weight vectors in the text encoder and the vision encoder, a difference between the weight vectors in different layers of the text encoder, and a number of parameters in the multimodal model.

Inventors:
GAO SHANGQIAN (US)
UZKENT BURAK (US)
SHEN YILIN (US)
JIN HONGXIA (US)
Application Number:
PCT/KR2023/014832
Publication Date:
April 04, 2024
Filing Date:
September 26, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
SAMSUNG ELECTRONICS CO LTD (KR)
International Classes:
G06N3/0455; G06F16/432; G06F16/532; G06F16/56; G06F16/9032; G06F16/9038; G06N3/0499; G06N3/08
Attorney, Agent or Firm:
KIM, Tae-hun et al. (KR)
Download PDF: