Enhancing Text Categorization Using Sentence Semantics

Authors:
Shady Shehata;Fakhri Karray;Mohamed Kamel
Affiliations:
Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, Canada N2L 3G1;Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, Canada N2L 3G1;Pattern Analysis and Machine Intelligence (PAMI) Research Group, University of Waterloo, Waterloo, Canada N2L 3G1
Venue:
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Year:
2008

Citing 9
Cited 1

A vector space model for automatic indexing

Communications of the ACM
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automatic labeling of semantic roles

Computational Linguistics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Semantic Role Parsing: Adding Semantic Structure to Unstructured Text

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Support Vector Learning for Semantic Argument Classification

Machine Learning
Enhancing Text Clustering Using Concept-based Mining Model

ICDM '06 Proceedings of the Sixth International Conference on Data Mining

Pairwise optimized Rocchio algorithm for text categorization

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of text categorization techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document.A new concept-based model that analyzes terms on the sentence and document levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning.A set of experiments using the proposed concept-based model on different datasets in text categorization is conducted. The experiments demonstrate the comparison between traditional weighting and the concept-based weighting enhances the quality of categorization quality of sets of documents substantially.