Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ディープラーニングに基づく中国語単語分割方法、装置、記憶媒体及びコンピュータ機器
Document Type and Number:
Japanese Patent JP7178513
Kind Code:
B2
Abstract:
A Chinese word segmentation method and apparatus based on deep learning. The method comprises: converting training corpus data into character-level data; converting the character-level data into sequence data; segmenting the sequence data according to pre-set symbols to obtain a plurality of pieces of sub-sequence data, and grouping the plurality of pieces of sub-sequence data according to the lengths of the sub-sequence data to obtain K data sets; according to the K data sets, obtaining K trained time sequence convolutional neural network-conditional random field models; and inputting data obtained after the processing of target corpus data into at least one of the K trained time sequence convolutional neural network-conditional random field models to obtain a word segmentation result for the target corpus data. Therefore, the method can solve the problem of the low accuracy of Chinese word segmentation in the prior art.

Inventors:
陳 ▲ミン▼川
Horse
King's army
Application Number:
JP2021563188A
Publication Date:
November 25, 2022
Filing Date:
November 14, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
PING AN TECHNOLOGY (SHENZHEN) CO.,LTD.
International Classes:
G06F40/284; G06F40/216; G06F40/268; G06N3/04; G06N3/08
Domestic Patent References:
JP2008140117A
Foreign References:
CN103020034B
CN108268444A
Other References:
WANG, Chunqi、XU, Bo,Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation [Online],2017年11月13日,pp.1-10,https://arxiv.org/pdf/1711.0441v1
Attorney, Agent or Firm:
Sbpj International Patent Office