Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
CROSS-LANGUAGE DATA ENHANCEMENT-BASED WORD SEGMENTATION METHOD AND APPARATUS
Document Type and Number:
WIPO Patent Application WO/2022/148467
Kind Code:
A1
Abstract:
Embodiments of the present application provide a cross-language data enhancement-based word segmentation method and apparatus. The technical solution provided in the embodiments of the present application is: acquiring high-resource language data and processing same to obtain word segment materials, acquiring low-resource language data to obtain candidate word segments, and screening the candidate word segments according to the word segment materials obtained from the high-resource language data to select candidate word segments having a high degree of matching with the word segment materials as word segment materials of the low-resource language data, training a word segmentation model on the basis of the low-resource word segment materials to automatically output candidate word segmentation results for the low-resource language data on the basis of the model, and selecting a word segmentation result according to the degree of matching between the candidate word segmentation results and the word segment materials of the high-resource language data. The use of high-resource language materials to automatically augment and validate model training data in low-resource languages solves the problem of imbalance between data resources and annotation resources in different languages, thereby providing an easier and more efficient solution for an iterative word segmentation model.

Inventors:
ZHANG JIANNING (SG)
Application Number:
PCT/CN2022/071144
Publication Date:
July 14, 2022
Filing Date:
January 10, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BIGO TECH PTE LTD (SG)
ZHANG JIANNING (SG)
International Classes:
G06F40/289
Domestic Patent References:
WO2020242567A12020-12-03
Foreign References:
CN112765977A2021-05-07
CN111090727A2020-05-01
CN111144102A2020-05-12
CN111460804A2020-07-28
CN111382568A2020-07-07
US20160027433A12016-01-28
US20200098352A12020-03-26
Attorney, Agent or Firm:
BEIJING ZFANG PATENT AGENCY (CN)
Download PDF: