Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
RETRIEVING METHOD FOR DOCUMENT INFORMATION
Document Type and Number:
Japanese Patent JP2002189747
Kind Code:
A
Abstract:

To actualize exact retrieval which deters a retrieval omission caused by a recognition error of OCR(optical character recognition device) from a document database including recognition errors of OCR in a practical time even when a retrieval term having relatively long character string length is inputted.

This system which retrieves a document containing a retrieval character string that a retriever specifies from documents of text outputted through character recognition processing for image documents has a retrieval character string division step for dividing the retrieval character string into partial character strings in (n)-character units (n≥2), a similar character table reference step for referring to similar character tables in (n)-character units generated beforehand by storing similar character strings in similar character shapes which are misrecognized with high possibility for the partial character strings in the (n)-character units, and a retrieval character string expansion step for extracting the partial character strings constituting the retrieval character string by referring to the (n)-character unit similar character tables and generating expanded words by combining the similar character string groups.


Inventors:
TADA KATSUMI
KOIZUMI NAOHIRO
TAKATORI HISASHI
Application Number:
JP2000389956A
Publication Date:
July 05, 2002
Filing Date:
December 19, 2000
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
HITACHI LTD
HITACHI SYSTEMS & SERVICES LTD
International Classes:
G06F17/30; (IPC1-7): G06F17/30
Attorney, Agent or Firm:
Sakuta Yasuo