Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TEXT SCREENING METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
Document Type and Number:
WIPO Patent Application WO/2022/105497
Kind Code:
A1
Abstract:
A text screening method and apparatus, a device, and a storage medium. The method comprises: performing a word segmentation operation on a first text to be screened to obtain multiple segmented words, extracting keywords having a preset part of speech, and assigning weights to the segmented words and the keywords; calculating hash values of the segmented words and the keywords, and obtaining weight vectors of the segmented words and weight vectors of the keywords according to the hash values and the weights; accumulating the weight vectors to obtain a first weight vector and a second weight vector of the first text, and performing dimensionality reduction on the first weight vector and the second weight vector to obtain a first simhash value and a second simhash value of the first text; and calculating a distance value between the first simhash value and a third simhash value of a target text, calculating a distance value between the second simhash value and the third simhash value when the distance value is greater than a first preset value, and screening out the first text when the distance value is smaller than or equal to a second preset value. The method can be used for deduplicating abstract or summary texts.

Inventors:
DONG RUNHUA (CN)
Application Number:
PCT/CN2021/123907
Publication Date:
May 27, 2022
Filing Date:
October 14, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ONE CONNECT SMART TECH CO LTD SHENZHEN (CN)
International Classes:
G06F40/216; G06F40/211; G06F40/289; G06K9/62
Foreign References:
CN112364625A2021-02-12
CN108776654A2018-11-09
CN110737748A2020-01-31
CN111339166A2020-06-26
CN107066623A2017-08-18
US20180107359A12018-04-19
Attorney, Agent or Firm:
SHENZHEN WORLD INTELLECTUAL PROPERTY AGENCY (GENERAL PARTNERSHIP ) (CN)
Download PDF: