文書抽出プログラム、文書抽出装置、及び文書抽出方法

Title:

文書抽出プログラム、文書抽出装置、及び文書抽出方法

Document Type and Number:

Japanese Patent JP7419961

Kind Code:

B2

Abstract:

To extract two documents having a correspondence relation from a set of documents respectively described in two languages with high accuracy.SOLUTION: A computer generates first distribution information showing a distribution of first feature amounts based on a distance between two words included in each of a plurality of first language documents described in the first language. The computer generates second distribution information showing a distribution of second feature amounts based on a distance between two words included in each of a plurality of second language documents described in the second language. The computer extracts a specific first language document from the plurality of first language documents and extracts a specific second language document form the plurality of second language documents on the basis of a similarity between the first distribution information of each of the plurality of first language documents and the second distribution information of each of the plurality of second language documents. The specific second language document is a second language document corresponding to the specific first language document.SELECTED DRAWING: Figure 6

Inventors:

Shun Liang
Seiji Okajima

Application Number:

JP2020083737A

Publication Date:

January 23, 2024

Filing Date:

May 12, 2020

Export Citation:

Click for automatic bibliography generation Help

Assignee:

富士通株式会社

International Classes:

G06F40/216; G06F16/383

Domestic Patent References:

JP2018180839A
JP2012506596A

Foreign References:

WO2015145981A1

Attorney, Agent or Firm:

Infot Patent Attorney Corporation
Hiroyoshi Aoki
Masayuki Amada
Yoshiyuki Ohsuga

Previous Patent: Data collection system, data collection method and program

Next Patent: Steering device manufacturing method and steering device