User-assisted similarity estimation for searching related web pages

Authors:
Lin Li;Zhenglu Yang;Kulwadee Somboonviwat;Masaru Kitsuregawa
Affiliations:
University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the eighteenth conference on Hypertext and hypermedia
Year:
2007

Citing 23
Cited 1

Introduction to algorithms

Introduction to algorithms
Life, death, and lawfulness on the electronic frontier

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Inferring Web communities from link topology

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Efficient identification of Web communities

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Stable algorithms for link analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating strategies for similarity search on the web

Proceedings of the 11th international conference on World Wide Web
Self-Organization and Identification of Web Communities

Computer
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Query Expansion by Mining User Logs

IEEE Transactions on Knowledge and Data Engineering
Finding Related Pages Using the Link Structure of the WWW

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Partitioning of Web graphs by community topology

WWW '05 Proceedings of the 14th international conference on World Wide Web
The indexable web is more than 11.5 billion pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Neighborhood Formation and Anomaly Detection in Bipartite Graphs

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Link-based similarity measures for the classification of Web documents

Journal of the American Society for Information Science and Technology
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Web document clustering using hyperlink structures

Computational Statistics & Data Analysis

Towards improving the online shopping experience: A client-based platform for post-processing Web search results

Web Intelligence and Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

To utilize the similarity information hidden in the Web graph, we investigate the problem of adaptively retrieving related Web pages with user assistance. Given a definition of similarities between pages, it is intuitive to estimate that any similarity will propagate from page to page, inducing an implicit topical relatedness between pages. In this paper, we extract connected subgraphs from the whole graph that consists of all pairs of pages whose similarity scores are above a given threshold, and then sort the candidates of related pages by a novel rank measure which is based on the combination distances of a flexible hierarchical clustering. Moreover, due to the subjectivity of similarity values, we dynamically supply the ordering list of related pages according to a parameter adjusted by users. We show our approach effectively handles a set of pages originating from three related categories of Web hierarchies, such as Google Directory. The experiments with three similarity measures demonstrate that using in-link information is favorable while using a combination measure of in-links and out-links lowers the precision of identifying similar pages.