Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CORPUS SELECTION DEVICE, CORPUS SELECTION METHOD, AND PROGRAM
Document Type and Number:
Japanese Patent JP2012177835
Kind Code:
A
Abstract:

To provide a corpus selection device, a corpus selection method, and a program which select a learning corpus capable of achieving both the improvement in quality of a language model and the reduction in capacity for use of a storage area.

A corpus selection device AA divides a learning corpus (whole) into learning corpuses (subset 1 to subset 3), and subset language models 1 to 3 corresponding to the learning corpuses (subsets 1 to 3) respectively are generated by language modeling. With respect to respective subset language models 1 to 3, perplexities are calculated using a task representation corpus to obtain perplexity-1 to perplexity-Y. Learning corpuses corresponding to subset language model having lower perplexities are removed from the learning corpus(whole) to select a learning corpus (selected).


Inventors:
SHINTO YASUTAKA
UTSUNOMIYA EIJI
FURUI SADAOKI
SHINOZAKI TAKAHIRO
KUBOTA YU
Application Number:
JP2011041523A
Publication Date:
September 13, 2012
Filing Date:
February 28, 2011
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
KDDI CORP
TOKYO INST TECH
International Classes:
G10L15/06; G10L15/193
Attorney, Agent or Firm:
Kiyoshi Kato