PURPOSE: To accurately segmentate and roughly classify input voices at exact frame positions.
CONSTITUTION: A linear predictive coefficient and residual power are calculated by performing LPC analysis to the input voice, thus, a cepstrum coefficient is calculated and further, a feature parameter Pn sequence 2 is calculated for each frame f based on this cepstrum coefficient. Continuously, prescribed frame components Pn-m (or Pnf) of this frame f are inputted to a neural network 3. Then, output selection 5 compares each output value with a prescribed threshold value concerning an output value time sequence 4 of the neural network 3, selects any value exceeding the threshold value or turning to the maximun output and provides a roughly classified phoneme sequence for each frame by replacing the output value with any correspondent stepwise classified phoneme symbol. Then, smoothing/shaping processing 6 is performed to this roughly classified phoneme sequence, and the segmentated/roughly classified and recognized result is provided.
KATO TOSHIFUMI
TSUZUKI YOSHIHIKO