情報抽出方法、情報抽出装置、情報抽出プログラム

Title:

情報抽出方法、情報抽出装置、情報抽出プログラム

Document Type and Number:

Japanese Patent JP5559104

Kind Code:

B2

Abstract:

PROBLEM TO BE SOLVED: To extract the text from a structured document without depending upon a rule for text extraction.SOLUTION: A document set recording part 2 records an HTML file of a document to be processed in a document set DB 3. A link source information extraction part 4 extracts a hyperlink embedded in an HTML file, acquired from the document set DB 3, and link peripheral text information. A text extraction part 5 specifies a hyperlink referring to the HTML file acquired from the document set DB 3 as an HTML file of a link destination document on condition that the hyperlink is extracted by the link source information extraction part 4. The text extraction part 5 compares a character string of text information present in the HTML file of the specified link destination document with a character string of the text information, and extracts a representative part in the link destination document as the body. An output part 6 outputs the extracted text.

More Like This:

WO/2009/062556	DATA LINK LAYER FOR DATABASES
JP2012003550	DOCUMENT RETRIEVAL METHOD AND PROGRAM
WO/2016/168153	CONTEXT-SENSITIVE CONTENT RECOMMENDATION USING ENTERPRISE SEARCH AND PUBLIC SEARCH

Inventors:

Masayuki Sugizaki
Yuichiro Sekiguchi
Kenji Ezaki
Tadashi Uchiyama

Application Number:

JP2011166460A

Publication Date:

July 23, 2014

Filing Date:

July 29, 2011

Export Citation:

Click for automatic bibliography generation Help

Assignee:

Nippon Telegraph and Telephone CORP.

International Classes:

G06F17/30; G06F13/00

Attorney, Agent or Firm:

Hiromichi Kobayashi
Hidehisa Uzawa
Yamaguchi Koji
Hashimoto 剛

Previous Patent: 移動通信端末及びそのコンテンツ処理方法

Next Patent: AGRICULTURAL AND HORTICULTURAL INSECTICIDAL COMPOSITION