Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
A vector space model for automatic indexing
Communications of the ACM
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
A new family of online algorithms for category ranking
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical Text Categorization Using Neural Networks
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization: An Experiment Using Phrases
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text categorization by boosting automatically extracted concepts
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A family of additive online algorithms for category ranking
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Accuracy improvement of automatic text classification based on feature transformation
Proceedings of the 2003 ACM symposium on Document engineering
Feature selection for text categorization on imbalanced data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Multilingual assistant for medical diagnosing and drug prescription based on category ranking
COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
A multilingual and multiplatform application for medicinal plants prescription from medical symptoms
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A portable multilingual medical directory by automatic categorization of Wikipedia articles
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and analyze which measures and peculiarities can help us to represent documents so that the resulting features are as much discriminative and representative as possible. Considerations such as selecting only nouns and adjectives, taking expressions rather than words, and using measures like term length, are combined into a simple feature selection and weighting method to extract, select and weight especial n-grams. Several experiments are performed to prove the usefulness of the new schema with different data sets (Reuters and OHSUMED) and two different algorithms (SVM and a simple sum of weights). After evaluation, the new approach outperforms some of the best known and most widely used categorization methods.