Enhancing Text Categorization Using Sentence Semantics

  • Authors:
  • Shady Shehata;Fakhri Karray;Mohamed Kamel

  • Affiliations:
  • Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, Canada N2L 3G1;Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, Canada N2L 3G1;Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, Canada N2L 3G1

  • Venue:
  • ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most of text categorization techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document.A new concept-based model that analyzes terms on the sentence and document levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning.A set of experiments using the proposed concept-based model on different datasets in text categorization is conducted. The experiments demonstrate the comparison between traditional weighting and the concept-based weighting enhances the quality of categorization quality of sets of documents substantially.