CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Clustering hypertext with applications to web searching
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
A robust and scalable clustering algorithm for mixed type attributes in large database environment
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Probabilistic Models of Relational Structure
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Self-citation and self-reference: credibility and promotion in academic publication
Journal of the American Society for Information Science and Technology
Stochastic link and group detection
Eighteenth national conference on Artificial intelligence
Utilizing hyperlink transitivity to improve web page clustering
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Improving Category Specific Web Search by Learning Query Modifications
SAINT '01 Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001)
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Use Link-Based Clustering to Improve Web Search Results
WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Learning probabilistic models of link structure
The Journal of Machine Learning Research
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering using hyperlink structures
Computational Statistics & Data Analysis
Detecting research topics via the correlation between graphs and texts
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding topic trends in digital libraries
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Information Processing and Management: an International Journal
Costco: robust content and structure constrained clustering of networked documents
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Understanding evolution of research themes: a probabilistic generative model for citations
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
It is well known that connectivity analysis of linked documents provides significant information about the structure of the document space for unsupervised learning tasks. However, the ability to identify distinct clusters of documents based on link graph analysis is proportional to the density of the graph and depends on the availability of the linking and/or linked documents in the collection. In this paper, we present an information theoretic approach towards measuring the significance of individual words based on the underlying link structure of the document collection. This enables us to generate a non-uniform weight distribution of the feature space which is used to augment the original corpus-based document similarities. The experimental results on the collection of scientific literature show that our method achieves better separation of distinct groups of documents, yielding improved clustering solutions.