NEWPAR: an automatic feature selection and weighting schema for category ranking

Authors:
Fernando Ruiz-Rico;Jose Luis Vicedo;María-Consuelo Rubio-Sánchez
Affiliations:
University of Alicante;University of Alicante;University of Alicante
Venue:
Proceedings of the 2006 ACM symposium on Document engineering
Year:
2006

Citing 20
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A sequential algorithm for training text classifiers: corrigendum and additional data

ACM SIGIR Forum
Pivoted document length normalization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A vector space model for automatic indexing

Communications of the ACM
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
A new family of online algorithms for category ranking

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization: An Experiment Using Phrases

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text categorization by boosting automatically extracted concepts

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A family of additive online algorithms for category ranking

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Accuracy improvement of automatic text classification based on feature transformation

Proceedings of the 2003 ACM symposium on Document engineering
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets

Multilingual assistant for medical diagnosing and drug prescription based on category ranking

COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
A multilingual and multiplatform application for medicinal plants prescription from medical symptoms

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A portable multilingual medical directory by automatic categorization of Wikipedia articles

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Category ranking provides a way to classify plain text documents into a pre-determined set of categories. This work proposes to have a look at typical document collections and analyze which measures and peculiarities can help us to represent documents so that the resulting features are as much discriminative and representative as possible. Considerations such as selecting only nouns and adjectives, taking expressions rather than words, and using measures like term length, are combined into a simple feature selection and weighting method to extract, select and weight especial n-grams. Several experiments are performed to prove the usefulness of the new schema with different data sets (Reuters and OHSUMED) and two different algorithms (SVM and a simple sum of weights). After evaluation, the new approach outperforms some of the best known and most widely used categorization methods.