Learning Greek verb complements: addressing the class imbalance

Authors:
Katia Kermanidis;Manolis Maragoudakis;Nikos Fakotakis;George Kokkinakis
Affiliations:
University of Patras, Greece;University of Patras, Greece;University of Patras, Greece;University of Patras, Greece
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 11
Cited 2

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments

Machine Learning
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Automatic extraction of subcategorization from corpora

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Automatic extraction of subcategorization frames for Czech

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Statistical filtering and subcategorization frame acquisition

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Automatic distinction of arguments and modifiers: the case of prepositional phrases

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Learning argument/adjunct distinction for Basque

ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

The effect of borderline examples on language learning

Journal of Experimental & Theoretical Artificial Intelligence
Exploring discrepancies in findings obtained with the KDD Cup '99 data set

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Imbalanced training sets, where one class is heavily underrepresented compared to the others, have a bad effect on the classification of rare class instances. We apply One-sided Sampling for the first time to a lexical acquisition task (learning verb complements from Modern Greek corpora) to remove redundant and misleading training examples of verb non-dependents and thereby balance our training set. We experiment with well-known learning algorithms to classify new examples. Performance improves up to 22% in recall and 15% in precision after balancing the dataset.