To actualize exact retrieval which deters a retrieval omission caused by a recognition error of OCR(optical character recognition device) from a document database including recognition errors of OCR in a practical time even when a retrieval term having relatively long character string length is inputted.
This system which retrieves a document containing a retrieval character string that a retriever specifies from documents of text outputted through character recognition processing for image documents has a retrieval character string division step for dividing the retrieval character string into partial character strings in (n)-character units (n≥2), a similar character table reference step for referring to similar character tables in (n)-character units generated beforehand by storing similar character strings in similar character shapes which are misrecognized with high possibility for the partial character strings in the (n)-character units, and a retrieval character string expansion step for extracting the partial character strings constituting the retrieval character string by referring to the (n)-character unit similar character tables and generating expanded words by combining the similar character string groups.
KOIZUMI NAOHIRO
TAKATORI HISASHI
HITACHI SYSTEMS & SERVICES LTD