Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD FOR TRAINING DOCUMENT CLASSIFICATION MODEL, AND RELATED APPARATUS
Document Type and Number:
WIPO Patent Application WO/2021/057133
Kind Code:
A1
Abstract:
Disclosed are a method for training a document classification model, and a related apparatus. The method comprises: on the basis of the context of a word in a document, a vector of the word, and an identifier of the document, obtaining a feature vector of the document by using an unsupervised learning algorithm; and taking documents labeled with classification tags as training documents, and on the basis of feature vectors and classification tags of a plurality of training documents, obtaining a document classification model by means of training with a dichotomy algorithm, wherein the classification tags are target category tags or non-target category tags. It can be seen that a feature vector of a document is extracted on the basis of the unsupervised algorithm by taking the context of the word in the document and the identifier of the document as an input, and the vector of the word as an output, and by taking the correlation between the context of the word and the context in the same document into account, the universality of the feature vector of the document is improved, such that the actual classification effect of a document classification model obtained through training with regard to documents which are not labeled with classification tags is better, thereby improving the classification accuracy of the document classification model.

Inventors:
REN ZHUO (CN)
Application Number:
PCT/CN2020/097869
Publication Date:
April 01, 2021
Filing Date:
June 24, 2020
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
BEIJING GRIDSUM TECHNOLOGY CO (CN)
International Classes:
G06F16/35
Domestic Patent References:
WO2019035765A92019-03-21
Foreign References:
CN109697285A2019-04-30
CN106202010A2016-12-07
CN109635107A2019-04-16
CN106776711A2017-05-31
Attorney, Agent or Firm:
UNITALEN ATTORNEYS AT LAW (CN)
Download PDF: