A vector space model for automatic indexing
Communications of the ACM
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Automatic labeling of semantic roles
Computational Linguistics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Head-driven statistical models for natural language parsing
Head-driven statistical models for natural language parsing
Semantic Role Parsing: Adding Semantic Structure to Unstructured Text
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Support Vector Learning for Semantic Argument Classification
Machine Learning
Enhancing Text Clustering Using Concept-based Mining Model
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Pairwise optimized Rocchio algorithm for text categorization
Pattern Recognition Letters
Hi-index | 0.00 |
Most of text categorization techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads to discover the topic of the document.A new concept-based model that analyzes terms on the sentence and document levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning.A set of experiments using the proposed concept-based model on different datasets in text categorization is conducted. The experiments demonstrate the comparison between traditional weighting and the concept-based weighting enhances the quality of categorization quality of sets of documents substantially.