Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
言語モデル生成方法、装置およびプログラム、テキスト分析装置およびプログラム
Document Type and Number:
Japanese Patent JP4024614
Kind Code:
B2
Abstract:

To estimate symbolic chain probability (language model) by allocating all possible classes to one symbol.

As for individual symbols of a symbolic string read out of a text database 140 having text data stored on a storage medium, a plurality of corresponding classes are found by referring to a symbol-class correspondence table 150 having symbols and a single or a plurality of classes stored on the storage medium, and their class list is generated and stored on the storage medium. Then the appearance frequency of a class chain is counted for all combinations obtained by selecting classes, one by one, from N (an integer of ≥2) class lists corresponding to N symbols which are adjacent in the read symbol string, and symbolic chain probability as a language model is generated from frequency information on class appearance chains obtained as a result of the counting.

COPYRIGHT: (C)2004,JPO


Inventors:
Takaaki Hori
Katsutoshi Otsuki
Shoichi Matsunaga
Application Number:
JP2002226575A
Publication Date:
December 19, 2007
Filing Date:
August 02, 2002
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
Nippon Telegraph and Telephone Corporation
International Classes:
G06F17/28; G10L15/197; G10L15/06; G10L15/187
Domestic Patent References:
JP2003263187A
JP2001516903A
JP2000259175A
JP1185744A
Other References:
乾 他,「多品詞性を考慮した日本語形態素解析」,情報処理学会研究報告(NL),1999年 3月 5日,Vol.99,No.22,pp.25-32,(99-NL-130)
伊藤 他,「単語およびクラスn-gram作成のためのツールキット」,電子情報通信学会技術研究報告(NLC),2001年12月15日,Vol.100,No.521,pp.67-72,(NLC2000-58)
Attorney, Agent or Firm:
Naoki Nakao
Taku Kusano
Yukio Nakamura
Minoru Inagaki