Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Models for retrieval with probabilistic indexing
Information Processing and Management: an International Journal - Modeling data, information and knowledge
A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
C4.5: programs for machine learning
C4.5: programs for machine learning
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text-Learning and Related Intelligent Agents: A Survey
IEEE Intelligent Systems
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
Two Odds-Radio-Based Text Classification Algorithms
WISEW '02 Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw'02)
Text Document Categorization by Term Association
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Best terms: an efficient feature-selection algorithm for text categorization
Knowledge and Information Systems
CCIC: Consistent Common Itemsets Classifier
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Statistical Identification of Key Phrases for Text Classification
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Document-Base Extraction for Single-Label Text Classification
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Practical application of associative classifier for document classification
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Threshold tuning for improved classification association rule mining
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
Data pre-processing is an important topic in Text Classification (TC). It aims to convert the original textual data in a data-mining-ready structure, where the most significant text-features that serve to differentiate between text-categories are identified. Broadly speaking, textual data pre-processing techniques can be divided into three groups: (i) linguistic, (ii) statistical, and (iii) hybrid (i) & (ii). With regard to language-independent TC, our study relates to the statistical aspect only. The nature of textual data pre-processing includes: Document-base Representation (DR) and Feature Selection (FS). In this paper, we propose a hybrid statistical FS approach that integrates two existing (statistical FS) techniques, DIAAF (Darmstadt Indexing Approach Association Factor) and GSSC (Galavotti(Sebastiani(Simi Coefficient). Our proposed approach is presented under a statistical "bag of phrases" DR setting. The experimental results, based on the well-established associative text classification approach, demonstrate that our proposed technique outperforms existing mechanisms with respect to the accuracy of classification.