Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND DEVICE FOR GENERATING UTTERANCE VIDEO BY USING VOICE SIGNAL
Document Type and Number:
WIPO Patent Application WO/2020/256472
Kind Code:
A1
Abstract:
Disclosed are a method and device for generating an utterance video by using a voice signal. An utterance video generation device according to an embodiment disclosed herein is a computing device provided with at least one processor, and memory for storing at least one program executed by the at least one processor, the utterance video generation device including: a first encoder which receives a background image of a predetermined person, which is the video portion of an utterance video of the person, and extracts an image feature vector from the background image of the person; a second encoder which receives an utterance audio signal, which is the audio portion of the utterance video, and extracts a voice feature vector from the utterance audio signal; a combining unit which generates a combined vector by combining the image feature vector output from the first encoder and the voice feature vector output from the second encoder; and a decoder which uses the combined vector as an input to reconstruct the utterance video of the person.

Inventors:
CHAE GYEONGSU (KR)
HWANG GUEMBUEL (KR)
PARK SUNGWOO (KR)
JANG SEYOUNG (KR)
Application Number:
PCT/KR2020/007975
Publication Date:
December 24, 2020
Filing Date:
June 19, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MONEYBRAIN INC (KR)
International Classes:
H04N5/265; G06N3/08; G10L13/027; G10L19/00; H04N21/2368; H04N21/439
Domestic Patent References:
WO2018213841A12018-11-22
Foreign References:
KR20060090687A2006-08-14
KR20140037410A2014-03-27
Other References:
VOUGIOUKAS KONSTANTINOS; PETRIDIS STAVROS; PANTIC MAJA: "Realistic Speech-Driven Facial Animation with GANs", INTERNATIONAL JOURNAL OF COMPUTER VISION., KLUWER ACADEMIC PUBLISHERS, NORWELL., US, vol. 128, no. 5, 13 October 2019 (2019-10-13), US, pages 1398 - 1413, XP037130234, ISSN: 0920-5691, DOI: 10.1007/s11263-019-01251-8
TRIANTAFYLLOS AFOURAS; JOON SON CHUNG; ANDREW SENIOR; ORIOL VINYALS; ANDREW ZISSERMAN: "Deep Audio-Visual Speech Recognition", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 6 September 2018 (2018-09-06), 201 Olin Library Cornell University Ithaca, NY 14853, XP081079850
Attorney, Agent or Firm:
DOOHO IP LAW FIRM (KR)
Download PDF: