Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A vector space model for automatic indexing
Communications of the ACM
Probabilistic combination of content and links
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Phrase-based Document Similarity Based on an Index Graph Model
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An information-theoretic measure for document similarity
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A personalized collaborative digital library environment: a model and an application
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
Scaling link-based similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
SimFusion: measuring similarity using unified relationship matrix
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
ACM SIGKDD Explorations Newsletter
LinkClus: efficient clustering via heterogeneous semantic links
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Combining content and link for classification using matrix factorization
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A Neighborhood Search Method for Link-Based Tag Clustering
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Hi-index | 0.02 |
Along with a massive amount of information being placed online, it is a challenge to exploit the internal and external information of documents when assessing similarity between them. A variety of approaches have been proposed to model the document similarity based on different foundations, but usually they are not applicable for combining internal and external information. In this paper, we introduce a link-based method into content analysis, which is based on random walk on graphs. By defining similarity as the meeting probability of two random surfers, we propose a computational model for content analysis, which can also be integrated with external information of documents. Empirical study shows that our method achieves good accuracy, acceptable performance and fast convergent rate in multi-relational document similarity measuring.