To provide a technology for reducing the frequency of repetition of the score calculation of a page group when a new page is added.
A crawler 2 of a document retrieval device 1 collects an electronic document group from a content server group S, and stores it in a document database 3. A score calculation means 4 calculates the scores of each electronic document stored in the document database 3 based on the significance, and stores it in a score information database 5. A cluster generation means 7 acquires an electronic document newly added to the content server group S from an additional page database 6, and classifies the electronic document into a cluster uniquely determined from URL. A score estimation means 8 calculates the mean value of the scores of the electronic document belonging to the same cluster as the cluster, and stores it in an estimated value database 9 as the estimated value of the scores of the added document. The score calculation means 4 recalculates the scores of the electronic document group corresponding to the addition of the electronic document by using the estimated value of the scores.
UEMATSU YUKIO
KONAGAI SHUNSUKE
TAKAHASHI YAMATO
KATAOKA RYOJI
JP2006127529A | 2006-05-18 | |||
JP2007140973A | 2007-06-07 | |||
JP2005056206A | 2005-03-03 | |||
JP2006127529A | 2006-05-18 | |||
JP2007140973A | 2007-06-07 | |||
JP2005056206A | 2005-03-03 |
CSNG200900266223; 片瀬弘晶、外5名: 'リンク構造解析アルゴリズム高速化のための縮小Webリンク構造の構築' 電子情報通信学会 第19回データ工学ワークショップ論文集 [online] , 20080407, p.1-8, 電子情報通信学会データ工学研究専門委員会
Hiromichi Kobayashi
Uzawa Hidehisa
Koji Yamaguchi