APPARATUS, METHOD AND PROGRAM FOR EXTRACTING FEATURE WORD

Title:

APPARATUS, METHOD AND PROGRAM FOR EXTRACTING FEATURE WORD

Document Type and Number:

Japanese Patent JP2011164791

Kind Code:

A

Abstract:

To efficiently extract suitable feature words corresponding to a specific category.

A first appearance frequency indicating the number of document data in which word pairs included in a plurality of document data concurrently occur and a second appearance frequency indicating the number of document data in which word pairs concurrently occur out of the plurality of document data to which a specified category is made to correspond are calculated. A value obtained by dividing the first appearance frequency by the second appearance frequency is calculated as a degree of concurrent occurrence. Network data using words as nodes and the degree of concurrent occurrence as an edge is generated as matrix data which are a symmetrical matrix of N×N. A maximum inherent value of the generated matrix data is calculated as a degree of aggregation. A cluster being a set of a plurality of words determined from an inherent vector corresponding to the calculated degree of aggregation is extracted. A degree of the attribution of each word to the cluster is calculated. A plurality of nodes having attribution degrees exceeding a threshold are extracted as feature words expressing a feature of the specified category.

Inventors:

MATSUNAGA TSUTOMU
SUENAGA TAKASHI

Application Number:

JP2010024718A

Publication Date:

August 25, 2011

Filing Date:

February 05, 2010

Export Citation:

Click for automatic bibliography generation Help