To provide technology for determining a speech interval of each speaker, from speech signals of multiple speakers in the same sound interval, which is collected by a plurality of microphones.
A noise power estimation section 2 estimates noise power in a voiceless-sound interval, for each combination of the microphone and frequency, from each observation signal for each time frequency, which is respectively input by the plurality of microphones and converted to a frequency domain. An observation signal classification section 3 classifies an observation signal vector for each time frequency, in which each observation signal is a component, by using the estimated noise power and each observation signal, and its classification results are output. A signal separation section 4 separates each observation signal into a signal for each sound source by using the classification results. A voiced sound interval determination section 5 determines the voiced sound interval or the voiceless-sound interval of each sound source from the separated signal for each sound source.
COPYRIGHT: (C)2008,JPO&INPIT
Akiko Araki
Kazuhiro Otsuka
Masamoto Fujimoto
Kentaro Ishizuka
JP64081997A | ||||
JP2002236494A | ||||
JP2004170552A | ||||
JP2006208482A |
WO2005024788A1 |
Taku Kusano
Yukio Nakamura