A concept-based model for enhancing text categorization

Authors:
Shady Shehata;Fakhri Karray;Mohamed Kamel
Affiliations:
University of Waterloo;University of Waterloo;University of Waterloo
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 10
Cited 9

A vector space model for automatic indexing

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Automatic labeling of semantic roles

Computational Linguistics
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Head-driven statistical models for natural language parsing

Head-driven statistical models for natural language parsing
Semantic Role Parsing: Adding Semantic Structure to Unstructured Text

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Support Vector Learning for Semantic Argument Classification

Machine Learning
Enhancing Text Clustering Using Concept-based Mining Model

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Enhancing Text Retrieval Performance using Conceptual Ontological Graph

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops

Enriching the class diagram concepts to capture natural language semantics for database access

Data & Knowledge Engineering
Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Using negative voting to diversify answers in non-factoid question answering

Proceedings of the 18th ACM conference on Information and knowledge management
Mining positive and negative patterns for relevance feature discovery

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Aggressive dimensionality reduction with reinforcement local feature selection for text categorization

AICI'10 Proceedings of the 2010 international conference on Artificial intelligence and computational intelligence: Part I
Automatic categorization of questions for user-interactive question answering

Information Processing and Management: an International Journal
A pattern discovery model for effective text mining

MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
Enhancing biomedical concept extraction using semantic relationship weights

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most of text categorization techniques are based on word and/or phrase analysis of the text. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes moreto the meaning of its sentences than the other term. Thus, the underlying model should indicate terms that capture these mantics of text. In this case, the model can capture terms that present the concepts of the sentence, which leads todiscover the topic of the document. A new concept-based model that analyzes terms on the sentence and document levels rather than the traditional analysis of document only is introduced. The concept-based model can effectively discriminate between non-important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of concept-based statistical analyzer, conceptual ontological graph representation,and concept extractor. The term which contributes to the sentence semantics is assigned two different weights by the concept-based statistical analyzer and the conceptual ontological graph representation. These two weights are combined into a new weight. The concepts that have maximum combined weights are selected by the concept extractor. A set of experiments using the proposed concept-basedmodel on different datasets in text categorization is conducted. The experiments demonstrate the comparison between traditional weighting and the concept-based weighting obtained by the combined approach of the concept-based statistical analyzer and the conceptual ontological graph. The evaluation of results is relied on two quality measures, the Macro-averaged F1 and the Error rate. These quality measures are improved when the newly developedconcept-based model is used to enhance the quality of thetext categorization.