METHOD AND DEVICE FOR GENERATING UTTERANCE VIDEO BY USING VOICE SIGNAL

Title:

METHOD AND DEVICE FOR GENERATING UTTERANCE VIDEO BY USING VOICE SIGNAL

Document Type and Number:

WIPO Patent Application WO/2020/256472

Kind Code:

A1

Abstract:

Disclosed are a method and device for generating an utterance video by using a voice signal. An utterance video generation device according to an embodiment disclosed herein is a computing device provided with at least one processor, and memory for storing at least one program executed by the at least one processor, the utterance video generation device including: a first encoder which receives a background image of a predetermined person, which is the video portion of an utterance video of the person, and extracts an image feature vector from the background image of the person; a second encoder which receives an utterance audio signal, which is the audio portion of the utterance video, and extracts a voice feature vector from the utterance audio signal; a combining unit which generates a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder; and a decoder which uses the combined vector as an input to reconstruct the utterance video of the person.

Inventors:

CHAE GYEONGSU (KR)
HWANG GUEMBUEL (KR)
PARK SUNGWOO (KR)
JANG SEYOUNG (KR)

Application Number:

PCT/KR2020/007975

Publication Date:

December 24, 2020

Filing Date:

June 19, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

MONEYBRAIN INC (KR)

International Classes:

H04N5/265; G06N3/08; G10L13/027; G10L19/00; H04N21/2368; H04N21/439

Domestic Patent References:

WO2018213841A1

2018-11-22

Foreign References:

KR20060090687A	2006-08-14
KR20140037410A	2014-03-27

Other References:

VOUGIOUKAS KONSTANTINOS; PETRIDIS STAVROS; PANTIC MAJA: "Realistic Speech-Driven Facial Animation with GANs", INTERNATIONAL JOURNAL OF COMPUTER VISION., KLUWER ACADEMIC PUBLISHERS, NORWELL., US, vol. 128, no. 5, 13 October 2019 (2019-10-13), US, pages 1398 - 1413, XP037130234, ISSN: 0920-5691, DOI: 10.1007/s11263-019-01251-8
TRIANTAFYLLOS AFOURAS; JOON SON CHUNG; ANDREW SENIOR; ORIOL VINYALS; ANDREW ZISSERMAN: "Deep Audio-Visual Speech Recognition", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 6 September 2018 (2018-09-06), 201 Olin Library Cornell University Ithaca, NY 14853, XP081079850

Attorney, Agent or Firm:

DOOHO IP LAW FIRM (KR)

Download PDF:

View/Download PDF PDF Help

Previous Patent: METHOD AND DEVICE FOR GENERATING SPEECH VIDEO ON BASIS OF MACHINE LEARNING

Next Patent: POSITIVE ELECTRODE ACTIVE MATERIAL HAVING SURFACE PORTION DOPED WITH HETERO ELEMENTS, AND METHOD FOR...