PURPOSE: To identify a speaker with high precision by calculating an acoustic parameter for each frame and classifying respective voice frames to a prescribed number of preliminarily set categories in accordance with the value of this acoustic parameter and generating a feature quantity for each category and using this quantity to perform collation and decision.
CONSTITUTION: The voice signal of a speaker inputted from a microphone 10 is converted to a time series of acoustic parameters by an acoustic parameter converting part 20 frame by frame. A category classifying part 30 calculates the acoustic parameter for each frame and classifies respective voice frames to prescribed categories in accordance with the value of this parameter, and a feature quantity generating part 40 generates a feature quantity of each category by additive average of acoustic parameters or the like, and this feature quantity is used to perform collation and decision. Thus, information of individuality included in the voice signal is faithfully extracted without increasing the dictionary capacity to identify the speaker.
KITAGAWA HIROO