Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND SYSTEM FOR SCREENING CORPUS DATA IN VERTICAL FIELD
Document Type and Number:
WIPO Patent Application WO/2022/160818
Kind Code:
A1
Abstract:
A method and a system for screening corpus data in the vertical field. Said method comprises: performing word segmentation on corpus data to be screened, and converting words of said corpus data into word vectors (101); fusing the word vectors of all the words of said corpus data, and taking a fusion result as a semantic vector of said corpus data (102); and calculating the similarity between the semantic vector of said corpus data and a semantic vector pre-acquired from sample corpus data in the vertical field, and if the similarity is greater than a preset threshold, determining that the corpus data to be screened is corpus data in the vertical field (103). According to the method, corpus data to be screened of a document level, a sentence level and a paragraph level can be screened; in addition, when sample corpus data in the vertical field is small, the precision of corpus data screening can still be ensured.

Inventors:
CAI, Jie (CN)
Application Number:
PCT/CN2021/126661
Publication Date:
August 04, 2022
Filing Date:
October 27, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
IOL (WUHAN) INFORMATION TECHNOLOGY CO., LTD. (CN)
International Classes:
G06F16/33
Attorney, Agent or Firm:
CN-KNOWHOW INTELLECTUAL PROPERTY AGENT LIMITED (CN)
Download PDF: