Single-classifier memory-based phrase chunking

Authors:
Jorn Veenstra;Antal van den Bosch
Affiliations:
Tilburg University;Tilburg University
Venue:
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Year:
2000

Citing 9
Cited 10

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms

Artificial Intelligence Review - Special issue on lazy learning
The CN2 Induction Algorithm

Machine Learning
Induction of Decision Trees

Machine Learning
Unpacking Multi-valued Symbolic Features and Classes in Memory-Based Language Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II

Shallow parsing using specialized hmms

The Journal of Machine Learning Research
Shallow parsing using noisy and non-stationary training material

The Journal of Machine Learning Research
Using grammatical relations to compare parsers

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Shallow parsing on the basis of words only: a case study

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Extracting the unextractable: a case study on verb-particles

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
A classifier-based parser with linear run-time complexity

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Discovering text patterns by a new graphic model

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
UCSG shallow parser

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Developing an algorithm for mining semantics in texts

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II

Quantified Score

Hi-index	0.01

Visualization

Abstract

In the shared task for CoNLL-2000, words and tags form the basic multi-valued features for predicting a rich phrase segmentation code. While the tag features, containing WSJ part-of-speech tags (Marcus et al., 1993), have about 45 values, the word features have more than 10,000 values. In our study we have looked at how memory-based learning, as implemented in the TiMBL software system (Daelemans et al., 2000), can handle such features. We have limited our search to single classifiers, thereby explicitly ignoring the possibility to build a meta-learning classifier architecture that could be expected to improve accuracy. Given this restriction we have explored the following:1. The generalization accuracy of TiMBL with default settings (multi-valued features, overlap metric, feature weighting).2. The usage of MVDM (Stanfill and Waltz, 1986; Cost and Salzberg, 1993) (Section 2), which should work well on word value pairs with a medium or high frequency, but may work badly on word value pairs with low frequency.3. The straightforward unpacking of feature values into binary features. On some tasks we have found that splitting multi-valued features into several binary features can enhance performance of the classifier.4. A heuristic search for complex features on the basis of all unpacked feature values, and using these complex features for the classification task.