To provide a similarity calculation device suitable to effectively calculate the similarity of a word by uniformly reflecting the word in the calculation of similarity according to the frequency of appearance.
A document vector is generated based on a plurality of document data. The document vector has an element corresponding to each morpheme, and each element is calculated so as to be a value according to the appearance frequency of the corresponding morpheme. A word vector is then generated by the inversion matrix of a document word matrix that is a set of generated document vectors. Accordingly, the word vector has the element corresponding to each document data, and each element is a value proportional to the appearance frequency of each morpheme in the corresponding data of the plurality of document data and inversely proportional to the appearance frequency of each morpheme in the plurality of document data. The similarity of the word is calculated on the basis of the word vector.
JP2000339342A | 2000-12-08 | |||
JP2001043236A | 2001-02-16 | |||
JP2002073681A | 2002-03-12 | |||
JP2000207404A | 2000-07-28 | |||
JP2000112974A | 2000-04-21 | |||
JP2000172717A | 2000-06-23 | |||
JPH11167581A | 1999-06-22 |
Next Patent: INFORMATION PROVIDING SERVER AND INFORMATION PROVIDING METHOD