Term similarity and weighting framework for text representation

Authors:
Sadiq Sani;Nirmalie Wiratunga;Stewart Massie;Robert Lothian
Affiliations:
School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK
Venue:
ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
Year:
2011

Citing 15
Cited 0

WordNet: a lexical database for English

Communications of the ACM
A cooccurrence-based thesaurus and two applications to information retrieval

Information Processing and Management: an International Journal
A vector space model for automatic indexing

Communications of the ACM
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Acquiring Word Similarities with Higher Order Association Mining

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Web-Based Measure of Semantic Relatedness

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
A framework for understanding Latent Semantic Indexing (LSI) performance

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
From frequency to meaning: vector space models of semantics

Journal of Artificial Intelligence Research
Fast case retrieval nets for textual data

ECCBR'06 Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Expressiveness of natural language is a challenge for text representation since the same idea can be expressed in many different ways. Therefore, terms in a document should not be treated independently of one another since together they help to disambiguate and establish meaning. Term-similarity measures are often used to improve representation by capturing semantic relationships between terms. Another consideration for representation involves the importance of terms. Feature selection techniques address this by using statistical measures to quantify term usefulness for retrieval. In this paper we present a framework that combines term-similarity and weighting for text representation. This allows us to comparatively study the impact of term similarity, term weighting and any synergistic effect that may exist between them. Study of term similarity is based on approaches that exploit term co-occurrences within document and sentence contexts whilst term weighting uses the popular Chi-squared test. Our results on text classification tasks show that the combined effect of similarity and weighting is superior to each technique independently and that this synergistic effect is obtained regardless of co-occurrence context granularity.