画像質問応答方法、装置、コンピュータ装置、媒体及びプログラム - Beijing Baidu Netcom Science Technology Co., Ltd.

Title:

画像質問応答方法、装置、コンピュータ装置、媒体及びプログラム

Document Type and Number:

Japanese Patent JP7206309

Kind Code:

B2

Abstract:

The present disclosure provides a method for visual question answering, which relates to fields of computer vision and natural language processing. The method includes: acquiring an input image and an input question; detecting visual information and position information of each of at least one text region in the input image; determining semantic information and attribute information of each of the at least one text region based on the visual information and the position information; determining a global feature of the input image based on the visual information, the position information, the semantic information, and the attribute information; determining a question feature based on the input question; and generating a predicted answer for the input image and the input question based on the global feature and the question feature. The present disclosure further provides a device for visual question answering, a computer device and a non-transitory medium.

More Like This:

JP7241122	Smart response method and device, electronic device, storage medium and computer program
JP2023090625	DATA PROCESSING METHOD FOR DIALOGUE SYSTEM, APPARATUS, DEVICE, AND MEDIUM
WO/2021/144979	VECTOR CALCULATION DEVICE, CLASSIFICATION DEVICE, AND OUTPUT PROGRAM

Inventors:

Lu Penghara
Zhang Zhang Qiang
Liu Coral
All chapters
彭啓明
Wu
Luohua
Chen Yongfeng

Application Number:

JP2021035338A

Publication Date:

January 17, 2023

Filing Date:

March 05, 2021

Export Citation:

Click for automatic bibliography generation Help

Assignee:

Beijing Baidu Netcom Science Technology Co., Ltd.

International Classes:

G06F16/90; G06F16/903; G06T7/11

Domestic Patent References:

JP2018180986A
JP2018085093A
JP2018036794A

Foreign References:

US20190370587

Other References:

Anand Mishara、外3名,”OCR-VQA: Visual Question Answering by Reading Text in Images”,2019 INTERNATIONAL CONFERECE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR),[online],2019年09月20日,p.947-952,インターネット

Attorney, Agent or Firm:

Takuji Yamada
Hiroshi Okabe

Previous Patent: Live shopping broadcasting control method and apparatus

Next Patent: SAFETIES DEVICE FOR ELEVATOR AND ELEVATOR