WordNet: a lexical database for English
Communications of the ACM
A cooccurrence-based thesaurus and two applications to information retrieval
Information Processing and Management: an International Journal
A vector space model for automatic indexing
Communications of the ACM
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Acquiring Word Similarities with Higher Order Association Mining
ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Web-Based Measure of Semantic Relatedness
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
A framework for understanding Latent Semantic Indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
From frequency to meaning: vector space models of semantics
Journal of Artificial Intelligence Research
Fast case retrieval nets for textual data
ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Hi-index | 0.00 |
Expressiveness of natural language is a challenge for text representation since the same idea can be expressed in many different ways. Therefore, terms in a document should not be treated independently of one another since together they help to disambiguate and establish meaning. Term-similarity measures are often used to improve representation by capturing semantic relationships between terms. Another consideration for representation involves the importance of terms. Feature selection techniques address this by using statistical measures to quantify term usefulness for retrieval. In this paper we present a framework that combines term-similarity and weighting for text representation. This allows us to comparatively study the impact of term similarity, term weighting and any synergistic effect that may exist between them. Study of term similarity is based on approaches that exploit term co-occurrences within document and sentence contexts whilst term weighting uses the popular Chi-squared test. Our results on text classification tasks show that the combined effect of similarity and weighting is superior to each technique independently and that this synergistic effect is obtained regardless of co-occurrence context granularity.