To provide a fixed form sentence clustering device and a method capable of collecting, clustering, and outputting similar fixed form sentences as one class.
This fixed form sentence clustering device comprises a means 10 for dividing respective sentences in a corpus with every word, a word n gram dictionary means 20 for forming word n grams by linking the words obtained by division, and extracting and preserving the word n grams for satisfying a prescribed condition from the word n grams, a fixed form sentence extracting means 30 for forming a set of word n grams by combining a part or the whole in the preserved word n grams, and extracting a sentence becoming a prescribed threshold value or more in a rate including the set as a fixed form sentence form the corpus, a means 40 for preserving the extracted fixed form sentence and the set of word n grams, and a means 50 for calculating similarity of mutual classes to classes being a cluster of prescribed sentences included in corpus, and merging two classes having the highest similarity into one.
JPH06103135 | DOCUMENT FILE DEVICE |
JPWO2012001963 | Image output device, image output method, and image display device |
WO/2019/108793 | RECOMMENDER METHODS AND SYSTEMS FOR PATENT PROCESSING |