To identify the language of an input character string by utilizing the description characteristics of the language, remarking a specific character that frequently appears in the language and finding the appearance rate of the specific character in a character string being an identification object.
The language of an input character string is identified by finding the appearance rate of a specific character in a character string being an identification object, calculating the average value of character string lengths between the specific characters or finding the appearance rate of a specific range character by providing a specific character counter 102 which detects the appearance rate of the specific character from the number of appearances of the specific character in the input character string, a standard appearance rate memory 105 which stores a specific character reference appearance rate of a detection target language and a comparator 106 which compares the appearance rate of the specific character in the input character string with the specific character reference appearance rate of the detection target language.
JP6553180 | Systems and methods for language detection |
JP2000235574 | DOCUMENT PROCESSOR |
JP2018067159 | IMAGE PROCESSING APPARATUS AND IMAGE FORMING APPARATUS |
KOYAMA TAKAMASA