Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
DATA PROCESSING METHOD AND APPARATUS
Document Type and Number:
WIPO Patent Application WO/2024/041479
Kind Code:
A1
Abstract:
A data processing method, which is applied to the processing of an image including characters, and relates to the field of artificial intelligence. The method comprises: acquiring a first feature representation and a second feature representation, wherein the second feature representation is a text feature of first text, and the first text is text content included in an image; and according to the first feature representation and the second feature representation, obtaining a third feature representation by means of a target encoder, wherein the third feature representation is used for executing a downstream task, and the similarity between an execution result and a corresponding annotation and the similarity between the first feature representation and the second feature representation are used for updating an image encoder. By means of the present application, the capability of alignment between images and text can be improved by means of a two-tower mode, and then the interactive learning capability of features is further enhanced by using a single-tower structure.

More Like This:
Inventors:
LIU ZHIGUANG (CN)
BAI HAOLI (CN)
MENG XIAOJUN (CN)
LI WENTAO (CN)
XIE NIAN (CN)
WANG LIANGWEI (CN)
HOU LU (CN)
JIANG XIN (CN)
Application Number:
PCT/CN2023/114002
Publication Date:
February 29, 2024
Filing Date:
August 21, 2023
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HUAWEI TECH CO LTD (CN)
International Classes:
G06T11/60; G06N3/04; G06T9/00; G06V10/44; G06V10/74; G06V10/82; G06V30/148; G06V30/19
Foreign References:
CN115512005A2022-12-23
CN114283430A2022-04-05
CN113516143A2021-10-19
CN113836333A2021-12-24
CN114429566A2022-05-03
CN114707007A2022-07-05
US20220172080A12022-06-02
Other References:
SONG SIBO, WAN JIANQIANG, YANG ZHIBO, TANG JUN, CHENG WENQING, BAI XIANG, YAO CONG: "Vision-Language Pre-Training for Boosting Scene Text Detectors", ARXIV (CORNELL UNIVERSITY), CORNELL UNIVERSITY LIBRARY, ARXIV.ORG, ITHACA, 28 April 2022 (2022-04-28), Ithaca, XP093142990, [retrieved on 20240319], DOI: 10.48550/arxiv.2204.13867
Attorney, Agent or Firm:
SHENPAT INTELLECTUAL PROPERTY AGENCY (CN)
Download PDF: