ディープラーニングに基づく中国語単語分割方法、装置、記憶媒体及びコンピュータ機器

Title:

ディープラーニングに基づく中国語単語分割方法、装置、記憶媒体及びコンピュータ機器

Document Type and Number:

Japanese Patent JP7178513

Kind Code:

B2

Abstract:

A Chinese word segmentation method and apparatus based on deep learning. The method comprises: converting training corpus data into character-level data; converting the character-level data into sequence data; segmenting the sequence data according to pre-set symbols to obtain a plurality of pieces of sub-sequence data, and grouping the plurality of pieces of sub-sequence data according to the lengths of the sub-sequence data to obtain K data sets; according to the K data sets, obtaining K trained time sequence convolutional neural network-conditional random field models; and inputting data obtained after the processing of target corpus data into at least one of the K trained time sequence convolutional neural network-conditional random field models to obtain a word segmentation result for the target corpus data. Therefore, the method can solve the problem of the low accuracy of Chinese word segmentation in the prior art.

Inventors:

陳 ▲ミン▼川
Horse
King's army

Application Number:

JP2021563188A

Publication Date:

November 25, 2022

Filing Date:

November 14, 2019

Export Citation:

Click for automatic bibliography generation Help

Assignee:

PING AN TECHNOLOGY (SHENZHEN) CO.,LTD.

International Classes:

G06F40/284; G06F40/216; G06F40/268; G06N3/04; G06N3/08

Domestic Patent References:

JP2008140117A

Foreign References:

CN103020034B
CN108268444A

Other References:

WANG, Chunqi、XU, Bo,Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation [Online],2017年11月13日,pp.1-10,https://arxiv.org/pdf/1711.0441v1

Attorney, Agent or Firm:

Sbpj International Patent Office

Previous Patent: Adrenaline drug solution for injector

Next Patent: PRODUCTION OF STEEL PIPE COLUMNAR MATERIAL