Web page clustering: a hyperlink-based similarity and matrix-based hierarchical algorithms

Authors:
Jingyu Hou;Yanchun Zhang;Jinli Cao
Affiliations:
School of Information Technology, Deakin University, Melbourne, Australia;Department of Mathematics and Computing, University of Southern Queensland, Toowoomba, Australia;Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia
Venue:
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Year:
2003

Citing 27
Cited 2

Algorithms for clustering data

Algorithms for clustering data
Principles of distributed database systems

Principles of distributed database systems
Identifying aggregates in hypertext structures

HYPERTEXT '91 Proceedings of the third annual ACM conference on Hypertext
Structural analysis of hypertexts: identifying hierarchies and useful metrics

ACM Transactions on Information Systems (TOIS)
Cluster analysis for hypertext systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Proceedings of the the seventh ACM conference on Hypertext
Silk from a sow's ear: extracting usable structures from the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Life, death, and lawfulness on the electronic frontier

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Finding and visualizing inter-site clan graphs

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The quest for correct information on the Web: hyper search engines

Selected papers from the sixth international conference on World Wide Web
WebQuery: searching and visualizing the Web through connectivity

Selected papers from the sixth international conference on World Wide Web
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Measuring similarity of interests for clustering web-users

ADC '01 Proceedings of the 12th Australasian database conference
Modern Information Retrieval

Modern Information Retrieval
Constructing good quality web page communities

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Utilizing hyperlink transitivity to improve web page clustering

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
A Matrix Approach for Hierarchical Web Page Clustering Based on Hyperlinks

WISEW '02 Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw'02)
Effectively Finding Relevant Web Pages from Linkage Information

IEEE Transactions on Knowledge and Data Engineering
Use Link-Based Clustering to Improve Web Search Results

WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
On competitive learning

IEEE Transactions on Neural Networks

Improving density-based methods for hierarchical clustering of web pages

Data & Knowledge Engineering
Density link-based methods for clustering web pages

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a hyperlink-based web page similarity measurement and two matrix-based hierarchical web page clustering algorithms. The web page similarity measurement incorporates hyperlink transitivity and page importance within the concerned web page space. One clustering algorithm takes cluster overlapping into account, another one does not. These algorithms do not require predefined similarity thresholds for clustering, and are independent of the page order. The primary evaluations show the effectiveness of the proposed algorithms in clustering improvement.