To utilize extracted words for the syntax analyzation, etc., of the text by enabling the generation of a dictionary optimum for the text if registering the words as a dictionary.
The optimum consecutive character string is extracted 3 from the text 1 described in natural language and concerning a character adjacent to the consecutive character string, an appearing frequency appearing at the same time of the consecutive character string is investigated 4. Whether the character is provided or not with the high probability of being used integrally with the consecutive character string is objectively evaluated by means of a numerical value corresponding to this appearing frequency. When the frequency is high, the character string is recognized to be one group of words and phrases including the adjacent character. Words extracted in this way are registered as the dictionary to utilize for the syntax analyzation, etc., of the text.
SUGIO TOSHIYUKI
NAGATA JUNJI
Next Patent: CHARACTER RECOGNITION TRANSLATION SYSTEM