Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
FIXED FORM SENTENCE CLUSTERING DEVICE AND METHOD
Document Type and Number:
Japanese Patent JP2003085170
Kind Code:
A
Abstract:

To provide a fixed form sentence clustering device and a method capable of collecting, clustering, and outputting similar fixed form sentences as one class.

This fixed form sentence clustering device comprises a means 10 for dividing respective sentences in a corpus with every word, a word n gram dictionary means 20 for forming word n grams by linking the words obtained by division, and extracting and preserving the word n grams for satisfying a prescribed condition from the word n grams, a fixed form sentence extracting means 30 for forming a set of word n grams by combining a part or the whole in the preserved word n grams, and extracting a sentence becoming a prescribed threshold value or more in a rate including the set as a fixed form sentence form the corpus, a means 40 for preserving the extracted fixed form sentence and the set of word n grams, and a means 50 for calculating similarity of mutual classes to classes being a cluster of prescribed sentences included in corpus, and merging two classes having the highest similarity into one.


Inventors:
KATO NAOTO
Application Number:
JP2001274396A
Publication Date:
March 20, 2003
Filing Date:
September 11, 2001
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
JAPAN BROADCASTING CORP
International Classes:
G06F17/30; G06F17/28; G10L15/187; G10L15/197; (IPC1-7): G06F17/28; G06F17/30; G10L15/18
Attorney, Agent or Firm:
Ariga Gunichiro