Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
WEB PAGE CLUSTERING METHOD AND DEVICE
Document Type and Number:
WIPO Patent Application WO/2018/095411
Kind Code:
A1
Abstract:
Disclosed by the present invention is a web page clustering method and device, used for clustering web pages according to a web page framework, the method comprising: acquiring uniform resource locators (URL) of a plurality of web pages to be clustered; for the URL of each web page to be clustered, determining rewriting rules of the URL and classifying the URL according to the rewriting rules of the URL; determining a web page framework of the web page corresponding to each URL in each URL class, and determining whether each URL may be clustered according to the web page framework of the web page corresponding to each URL; and retaining the URL class if each URL may be clustered. By means of said method, web pages having a same web page framework structure may be clustered into one class, thus solving the problem wherein an existing clustering method is incapable of clustering according to web page framework and providing a clustering method which is more applicable to a web page framework processing procedure.

Inventors:
GUO YANGYANG (CN)
LIU SHAOBIN (CN)
LI FEI (CN)
LI HU (CN)
LIU LIJUN (CN)
Application Number:
PCT/CN2017/112883
Publication Date:
May 31, 2018
Filing Date:
November 24, 2017
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
NSFOCUS INFORMATION TECH CO (CN)
NSFOCUS TECH INC (CN)
International Classes:
G06F17/30
Foreign References:
CN106708952A2017-05-24
CN106095979A2016-11-09
CN102902790A2013-01-30
CN103631787A2014-03-12
US20070294252A12007-12-20
Attorney, Agent or Firm:
TDIP & PARTNERS (CN)
Download PDF: