To highly accurately correct a temporal deviation by highly accurately estimating a temporal deviation width between a voice and a caption included in a broadcast program.
A recognition unit 21 recognizes a voice in a broadcast program and generates a recognition result phoneme stream corresponding to the voice. A caption translation phoneme stream generating unit 22 generates a phoneme stream corresponding to each of captions in a video image of the broadcast program and generates a caption translation phoneme stream by connecting these phoneme streams. A collation unit 23 collates the caption translation phoneme streams from the caption translation phoneme stream generating unit 22, as one group, with the recognition result phoneme stream from the recognition unit 21, and estimates a temporal deviation width between the voice and the caption. The temporal deviation width estimated by the collation unit 23 is used to correct the temporal deviation between the voice and the caption.
COPYRIGHT: (C)2010,JPO&INPIT
Kazunori Matsumoto
Fumiaki Sugaya
JP2002244694A | ||||
JP10136260A | ||||
JP2007047575A | ||||
JP2006011257A | ||||
JP2004207821A |
Kaori Tanaka
Sanji Tanabe
Next Patent: METHOD AND DEVICE FOR ADJUSTING OUTPUT VOLTAGE OF HIGH-VOLTAGE UNIT IN IMAGE FORMING DEVICE