To appropriately perform speaking while eliminating danger that an object that is not desired to watch by a party of the call is projected, with respect to a video telephone set for transmitting/receiving voices and images.
The video telephone set for transmitting/receiving voices and images comprises an image pickup part for imaging a speaker to whom a call is originated using the video telephone set, a display part for displaying the image of the speaker picked up by the image pickup part during no call when a call is incoming to the video telephone set, and a transmission part for transmitting the image picked up by the image pickup part to a video telephone set of the speaking party together with voices of the speaker during a call. A selection part may also be further provided for selecting whether speaking is to be performed only with voices without transmitting the image picked up by the image pickup part or speaking is to be performed while transmitting the image in the state of displaying the image of the speaker on the display part.
