An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text databases & document management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Classifying text documents by associating terms with text categories
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
The use of bigrams to enhance text categorization
Information Processing and Management: an International Journal
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Automatic Web Rating: Filtering Obscene Content on the Web
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
SAT-MOD: moderate itemset fittest for text classification
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
A comparative study of citations and links in document classification
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Training linear SVMs in linear time
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Extending the single words-based document model: a comparison of bigrams and 2-itemsets
Proceedings of the 2006 ACM symposium on Document engineering
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Considering re-occurring features in associative classifiers
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Clustering a very large number of textual unstructured customers' reviews in english
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
Discovering health-related knowledge in social media using ensembles of heterogeneous features
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Coupled attribute analysis on numerical data
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
In this article we propose a data treatment strategy to generate new discriminative features, called compound-features (or c-features), for the sake of text classification. These c-features are composed by terms that co-occur in documents without any restrictions on order or distance between terms within a document. This strategy precedes the classification task, in order to enhance documents with discriminative c-features. The idea is that, when c-features are used in conjunction with single-features, the ambiguity and noise inherent to their bag-of-words representation are reduced. We use c-features composed of two terms in order to make their usage computationally feasible while improving the classifier effectiveness. We test this approach with several classification algorithms and single-label multi-class text collections. Experimental results demonstrated gains in almost all evaluated scenarios, from the simplest algorithms such as kNN (13% gain in micro-average F"1 in the 20 Newsgroups collection) to the most complex one, the state-of-the-art SVM (10% gain in macro-average F"1 in the collection OHSUMED).