To provide a video with sound synthesis system, capable of easily creating a video with sound by utilizing a user's favorite melody and video.
When a course selected from a user terminal is transmitted to a server device, following to point processing (Step S1), melody category selection processing (Step S2), melody selection processing (Step S3), arrange selection processing (Step S4), a lyrics determination processing (Step S5), voice quality/response selection processing (Step S6), video selection processing (Step S7) and title determination processing (Step S8) are executed. Then, speech synthesis processing (Step S9) is executed, based on the melody information, the arrange information, the lyrics information and the voice quality information/response information, in the server device. Based on the synthesized speech and the video information, the video with sound is synthesized (Step S10).