To solve such problem that it is impossible to determine whether or not the document belongs to a new category because the document has been classified into any of the existing categories even when a new category is generated in accordance with an elapse of time in document classification on a Bayesian network.
A document classification device includes a means for obtaining a plurality of items of document data to which categories are given in advance; a means for configuring the Bayesian network on the basis of the obtained document data; a means for calculating word importance for each category on the basis of the obtained document data; a classification target data means for obtaining a document to be classified; a means for classifying the obtained document through the Bayesian network; a means for obtaining words included in the obtained document; and a means for determining validity of a result classified through the Bayesian network by comparing the included words with the word importance and then comparing its result with the result classified through the Bayesian network.
COPYRIGHT: (C)2009,JPO&INPIT
Shigeru Matsumoto
Kyoko Makino
Rumi Hayakawa
Toshiba Solutions Corporation
JP2006085634A | ||||
JP2004240517A | ||||
JP2003534583A | ||||
JP11003350A | ||||
JP2002222083A | ||||
JP2005158010A | ||||
JP2006293767A |
WO2004084096A1 |