Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
APPARATUS, METHOD AND PROGRAM FOR EXTRACTING FEATURE WORD
Document Type and Number:
Japanese Patent JP2011164791
Kind Code:
A
Abstract:

To efficiently extract suitable feature words corresponding to a specific category.

A first appearance frequency indicating the number of document data in which word pairs included in a plurality of document data concurrently occur and a second appearance frequency indicating the number of document data in which word pairs concurrently occur out of the plurality of document data to which a specified category is made to correspond are calculated. A value obtained by dividing the first appearance frequency by the second appearance frequency is calculated as a degree of concurrent occurrence. Network data using words as nodes and the degree of concurrent occurrence as an edge is generated as matrix data which are a symmetrical matrix of N×N. A maximum inherent value of the generated matrix data is calculated as a degree of aggregation. A cluster being a set of a plurality of words determined from an inherent vector corresponding to the calculated degree of aggregation is extracted. A degree of the attribution of each word to the cluster is calculated. A plurality of nodes having attribution degrees exceeding a threshold are extracted as feature words expressing a feature of the specified category.


Inventors:
MATSUNAGA TSUTOMU
SUENAGA TAKASHI
Application Number:
JP2010024718A
Publication Date:
August 25, 2011
Filing Date:
February 05, 2010
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NTT DATA CORP
International Classes:
G06F17/30
Domestic Patent References:
JP2007079948A2007-03-29
JP2004030202A2004-01-29
JPH06251072A1994-09-09
Attorney, Agent or Firm:
Masatake Shiga
Yasuhiko Murayama