Hierarchical web-page clustering via in-page and cross-page link structures

Authors:
Cindy Xide Lin;Yintao Yu;Jiawei Han;Bing Liu
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Chicago
Venue:
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Year:
2010

Citing 13
Cited 1

Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Evaluating contents-link coupled web page clustering for web search results

Proceedings of the eleventh international conference on Information and knowledge management
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Utilizing hyperlink transitivity to improve web page clustering

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Clustering web pages based on their structure

Data & Knowledge Engineering - Special issue: WIDM 2003
Knowing a web page by the company it keeps

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
SCAN: a structural clustering algorithm for networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Improving density-based methods for hierarchical clustering of web pages

Data & Knowledge Engineering
Association Mining in Large Databases: A Re-examination of Its Measures

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Web page clustering using heuristic search in the web graph

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web pages reordering and clustering based on web patterns

SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science

MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques

Proceedings of the 21st international conference companion on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite of the wide diversity of web-pages, web-pages residing in a particular organization, in most cases, are organized with semantically hierarchic structures For example, the website of a computer science department contains pages about its people, courses and research, among which pages of people are categorized into faculty, staff and students, and pages of research diversify into different areas Uncovering such hierarchic structures could supply users a convenient way of comprehensive navigation and accelerate other web mining tasks In this study, we extract a similarity matrix among pages via in-page and crosspage link structures, based on which a density-based clustering algorithm is developed, which hierarchically groups densely linked webpages into semantic clusters Our experiments show that this method is efficient and effective, and sheds light on mining and exploring web structures.