Term similarity and weighting framework for text representation

  • Authors:
  • Sadiq Sani;Nirmalie Wiratunga;Stewart Massie;Robert Lothian

  • Affiliations:
  • School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK;School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK

  • Venue:
  • ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Expressiveness of natural language is a challenge for text representation since the same idea can be expressed in many different ways. Therefore, terms in a document should not be treated independently of one another since together they help to disambiguate and establish meaning. Term-similarity measures are often used to improve representation by capturing semantic relationships between terms. Another consideration for representation involves the importance of terms. Feature selection techniques address this by using statistical measures to quantify term usefulness for retrieval. In this paper we present a framework that combines term-similarity and weighting for text representation. This allows us to comparatively study the impact of term similarity, term weighting and any synergistic effect that may exist between them. Study of term similarity is based on approaches that exploit term co-occurrences within document and sentence contexts whilst term weighting uses the popular Chi-squared test. Our results on text classification tasks show that the combined effect of similarity and weighting is superior to each technique independently and that this synergistic effect is obtained regardless of co-occurrence context granularity.