DOCUMENT PICTURE STRUCTURE ANALYSIS METHOD

Title:

DOCUMENT PICTURE STRUCTURE ANALYSIS METHOD

Document Type and Number:

Japanese Patent JPH11232439

Kind Code:

A

Abstract:

To precisely and efficiently analyze the structure of a document picture by using content information when the document picture is converted into an electronized document.

For learning the document structure of a whole document, the document picture of a content page is taken in at first, it is extracted in a basic rectangle at every line, a character is recognized and is analyzed. Here, chapter/clause numbers are analyzed, indexes are extracted and the page numbers of the respective indexes are extracted. The document picture of the text page is taken in, several tens of continuous pages are inputted and the basic rectangle is extracted and analyzed against the respective pages. The layout elements of a header, a footer, the page number, chapter/clause, a text, graphic/ table are identified from the layout feature of the extracted basic rectangle. All the elements except for the rectangles identified as the graphic/list are character-recognized. The index is matched with an index candidate extracted in text analysis at every index page extracted in content analysis as a matching processing and more precise index information is analyzed.

More Like This:

JPH0850584	DOCUMENT DATA PROCESSOR
JPH07210347	FONT SERVICE SYSTEM
JPH10283353	DOCUMENT PROCESSOR AND METHOD THEREFOR AND RECORDING MEDIUM

Inventors:

HAYASHI TOSHINARI

Application Number:

JP5013098A

Publication Date:

August 27, 1999

Filing Date:

February 16, 1998

Export Citation:

Click for automatic bibliography generation Help

Assignee:

HAYASHI TOSHINARI

International Classes:

G06F17/21; G06F17/30; G06K9/20; G06T1/00; G06T7/00; G06T7/40; (IPC1-7): G06T1/00; G06F17/21; G06F17/27; G06T7/00

Attorney, Agent or Firm:

Kawahara Kazuho

Previous Patent: INFORMATION PROCESSOR AND INFORMATION RECORDING MEDIUM

Next Patent: IMAGE PROCESSOR