To estimate phoneme boundary time more accurately than before.
A voice feature amount of input voice in each frame is extracted; a most likely phoneme is allocated for each frame, by using a statistic amount about voice feature amounts of a plurality of phonemes; when phonemes allocated in successive two frames are different, the phoneme boundary time is estimated, by setting either of the times included in a time range of the two frames, to the phoneme boundary time. It is determined whether the phoneme boundary time is reliable. The phoneme boundary time of the phoneme boundary in which it is determined that the phoneme boundary time is not reliable is estimated, by allocating the time in which a continuity length of each phoneme is larger, the larger the average becomes; and the larger it is expanded/contracted, the larger the variance becomes, for each phoneme which constitutes the phoneme boundary in which it is determined that the phoneme boundary time is not reliable.
JPS61190396 | VCV START UP SYSTEM |
JP2023109914 | ISSUANCE OF WORD TIMING BY END-TO-END MODEL |
JP2008197551 | INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD |
MIYAZAKI NOBORU
MIZUNO HIDEYUKI
JP2001306087A | 2001-11-02 | |||
JPH09244681A | 1997-09-19 | |||
JPH11259095A | 1999-09-24 | |||
JPH09292899A | 1997-11-11 |
Taku Kusano
Yukio Nakamura