To estimate symbolic chain probability (language model) by allocating all possible classes to one symbol.
As for individual symbols of a symbolic string read out of a text database 140 having text data stored on a storage medium, a plurality of corresponding classes are found by referring to a symbol-class correspondence table 150 having symbols and a single or a plurality of classes stored on the storage medium, and their class list is generated and stored on the storage medium. Then the appearance frequency of a class chain is counted for all combinations obtained by selecting classes, one by one, from N (an integer of ≥2) class lists corresponding to N symbols which are adjacent in the read symbol string, and symbolic chain probability as a language model is generated from frequency information on class appearance chains obtained as a result of the counting.
COPYRIGHT: (C)2004,JPO
Katsutoshi Otsuki
Shoichi Matsunaga
JP2003263187A | ||||
JP2001516903A | ||||
JP2000259175A | ||||
JP1185744A |
伊藤 他,「単語およびクラスn-gram作成のためのツールキット」,電子情報通信学会技術研究報告(NLC),2001年12月15日,Vol.100,No.521,pp.67-72,(NLC2000-58)
Taku Kusano
Yukio Nakamura
Minoru Inagaki