Communications of the ACM - Special issue on parallelism
A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments
Machine Learning
Improving Identification of Difficult Small Classes by Balancing Class Distribution
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Automatic extraction of subcategorization from corpora
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatic extraction of subcategorization frames for Czech
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Statistical filtering and subcategorization frame acquisition
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Automatic distinction of arguments and modifiers: the case of prepositional phrases
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Learning argument/adjunct distinction for Basque
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The effect of borderline examples on language learning
Journal of Experimental & Theoretical Artificial Intelligence
Exploring discrepancies in findings obtained with the KDD Cup '99 data set
Intelligent Data Analysis
Hi-index | 0.00 |
Imbalanced training sets, where one class is heavily underrepresented compared to the others, have a bad effect on the classification of rare class instances. We apply One-sided Sampling for the first time to a lexical acquisition task (learning verb complements from Modern Greek corpora) to remove redundant and misleading training examples of verb non-dependents and thereby balance our training set. We experiment with well-known learning algorithms to classify new examples. Performance improves up to 22% in recall and 15% in precision after balancing the dataset.