PROGRESSIVE POSITIONING METHOD FOR TEXT-TO-VIDEO CLIP POSITIONING

Title:

PROGRESSIVE POSITIONING METHOD FOR TEXT-TO-VIDEO CLIP POSITIONING

Document Type and Number:

WIPO Patent Application WO/2022/088238

Kind Code:

A1

Abstract:

Disclose in the present invention is a progressive positioning method for text-to-video clip positioning. The method comprises: first, respectively extracting features of two modes, namely a video mode and a text mode by using different feature extraction methods; then progressively selecting different step lengths, and learning the correlation between the video and the text in multiple stages; and finally, training a model in an end-to-end mode by combining the correlation loss of each stage. Moreover, the fine time granularity stage is fused with information of the coarse time granularity stage by means of a condition feature updating module and up-sampling connection, such that different stages are mutually promoted. Different stages can pay attention to fragments with different time granularities, and the model can cope with the situation that the length of a target fragment is obviously changed by combining the interrelation between the stages. According to the present invention, the thinking mode of processing a fragment positioning task by human beings is used for reference, a mode from coarse to fine is adopted, the target fragment is positioned in a multi-stage and asymptotic manner, and the positioning performance is improved to a great extent.

Inventors:

WANG XUN (CN)
DONG JIANFENG (CN)
ZHENG QI (CN)
PENG JINGWEI (CN)

Application Number:

PCT/CN2020/127657

Publication Date:

May 05, 2022

Filing Date:

November 10, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

UNIV ZHEJIANG GONGSHANG (CN)

International Classes:

G06F16/783; G06K9/00; G06K9/62; G06N3/04

Domestic Patent References:

WO2018081751A1

2018-05-03

Foreign References:

CN111414845A	2020-07-14
CN110121118A	2019-08-13
CN109145712A	2019-01-04
US20170330363A1	2017-11-16

Other References:

XU TONG, DU HAO, CHEN ENHONG, CHEN JOYA, WU YUFEI: "Cross-modal video moment retrieval based on visual-textual relationship alignment", SCIENTIA SINICA INFORMATIONIS, vol. 50, no. 6, 1 June 2020 (2020-06-01), pages 862 - 876, XP055926110, ISSN: 1674-7267, DOI: 10.1360/SSI-2019-0292

Attorney, Agent or Firm:

HANGZHOU QIUSHI PATENT OFFICE CO., LTD. (CN)

Download PDF:

View/Download PDF PDF Help

Previous Patent: STORABLE FAN

Next Patent: MALEIMIDE-MODIFIED ACTIVE ESTER, PREPARATION METHOD THEREFOR AND USE THEREOF