Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering hypertext with applications to web searching
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Concept decompositions for large sparse text data using clustering
Machine Learning
Evaluating contents-link coupled web page clustering for web search results
Proceedings of the eleventh international conference on Information and knowledge management
Learning probabilistic models of link structure
The Journal of Machine Learning Research
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Hyperlink analysis on the world wide web
Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
Document clustering with prior knowledge
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A neighborhood-based approach for clustering of linked document collections
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Clustering scientific literature using sparse citation graph analysis
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Web document clustering using hyperlink structures
Computational Statistics & Data Analysis
Onomatology and content analysis of ergodic literature
Proceedings of the 3rd Narrative and Hypertext Workshop
Hi-index | 0.00 |
Connectivity analysis of networked documents provides high quality link structure information, which is usually lost upon a content-based learning system. It is well known that combining links and content has the potential to improve text analysis. However, exploiting link structure is non-trivial because links are often noisy and sparse. Besides, it is difficult to balance the term-based content analysis and the link-based structure analysis to reap the benefit of both. We introduce a novel networked document clustering technique that integrates the content and link information in a unified optimization framework. Under this framework, a novel dimensionality reduction method called COntent & STructure COnstrained (Costco) Feature Projection is developed. In order to extract robust link information from sparse and noisy link graphs, two link analysis methods are introduced. Experiments on benchmark data and diverse real-world text corpora validate the effectiveness of proposed methods.