Instance-Based Learning Algorithms
Machine Learning
WordNet: a lexical database for English
Communications of the ACM
Journal of the American Society for Information Science
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Algorithm 457: finding all cliques of an undirected graph
Communications of the ACM
Text databases & document management
Clustering Algorithms
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Inference for the Generalization Error
Machine Learning
A tutorial on support vector regression
Statistics and Computing
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
A knowledge-based search engine powered by wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Building semantic kernels for text classification using wikipedia
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to link with wikipedia
Proceedings of the 17th ACM conference on Information and knowledge management
Clustering Documents with Active Learning Using Wikipedia
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
WikiRelate! computing semantic relatedness using wikipedia
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Text-to-text semantic similarity for automatic short answer grading
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Feature generation for text categorization using world knowledge
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
WikiWalk: random walks on Wikipedia for semantic relatedness
TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
A Wikipedia-based multilingual retrieval model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Human assessments of document similarity
Journal of the American Society for Information Science and Technology
Boosting for text classification with semantic features
WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Hi-index | 0.00 |
Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance. © 2012 Wiley Periodicals, Inc.