A comparative evaluation of different link types on enhancing document clustering

Authors:
Xiaodan Zhang;Xiaohua Hu;Xiaohua Zhou
Affiliations:
Drexel University, Philadelphia, PA, USA;Drexel University, Philadelphia, PA, USA;Drexel University, Philadelphia, PA, USA
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 19
Cited 6

HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Proceedings of the the seventh ACM conference on Hypertext
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Clustering hypertext with applications to web searching

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating contents-link coupled web page clustering for web search results

Proceedings of the eleventh international conference on Information and knowledge management
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discovering Test Set Regularities in Relational Domains

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Hypertext Categorization using Hyperlink Patterns and Meta Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Machine Learning Approach to Building Domain-Specific Search Engines

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
THESUS: Organizing Web document collections based on link semantics

The VLDB Journal — The International Journal on Very Large Data Bases
Lexical and semantic clustering by web links

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Graph-based text classification: learn from your neighbors

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A neighborhood-based approach for clustering of linked document collections

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Semantic Smoothing for Model-based Document Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Dragon Toolkit: Incorporating Auto-Learned Semantic Knowledge into Large-Scale Text Retrieval and Mining

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Finding the k shortest paths

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Semantic smoothing of document models for agglomerative clustering

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Document Clustering with Cluster Refinement and Non-negative Matrix Factorization

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Document clustering using NMF and fuzzy relation

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Clustering web search results with maximum spanning trees

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Leveraging network structure for incremental document clustering

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Probability-based text clustering algorithm by alternately repeating two operations

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

With a growing number of works utilizing link information in enhancing document clustering, it becomes necessary to make a comparative evaluation of the impacts of different link types on document clustering. Various types of links between text documents, including explicit links such as citation links and hyperlinks, implicit links such as co-authorship links, and pseudo links such as content similarity links, convey topic similarity or topic transferring patterns, which is very useful for document clustering. In this study, we adopt a Relaxation Labeling (RL)-based clustering algorithm, which employs both content and linkage information, to evaluate the effectiveness of the aforementioned types of links for document clustering on eight datasets. The experimental results show that linkage is quite effective in improving content-based document clustering. Furthermore, a series of interesting findings regarding the impacts of different link types on document clustering are discovered through our experiments.