Communications of the ACM - Special issue on parallelism
Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms
Artificial Intelligence Review - Special issue on lazy learning
Machine Learning
Machine Learning
Unpacking Multi-valued Symbolic Features and Classes in Memory-Based Language Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Shallow parsing using specialized hmms
The Journal of Machine Learning Research
Shallow parsing using noisy and non-stationary training material
The Journal of Machine Learning Research
Using grammatical relations to compare parsers
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Shallow parsing on the basis of words only: a case study
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Extracting the unextractable: a case study on verb-particles
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
A classifier-based parser with linear run-time complexity
Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Discovering text patterns by a new graphic model
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Developing an algorithm for mining semantics in texts
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hi-index | 0.01 |
In the shared task for CoNLL-2000, words and tags form the basic multi-valued features for predicting a rich phrase segmentation code. While the tag features, containing WSJ part-of-speech tags (Marcus et al., 1993), have about 45 values, the word features have more than 10,000 values. In our study we have looked at how memory-based learning, as implemented in the TiMBL software system (Daelemans et al., 2000), can handle such features. We have limited our search to single classifiers, thereby explicitly ignoring the possibility to build a meta-learning classifier architecture that could be expected to improve accuracy. Given this restriction we have explored the following:1. The generalization accuracy of TiMBL with default settings (multi-valued features, overlap metric, feature weighting).2. The usage of MVDM (Stanfill and Waltz, 1986; Cost and Salzberg, 1993) (Section 2), which should work well on word value pairs with a medium or high frequency, but may work badly on word value pairs with low frequency.3. The straightforward unpacking of feature values into binary features. On some tasks we have found that splitting multi-valued features into several binary features can enhance performance of the classifier.4. A heuristic search for complex features on the basis of all unpacked feature values, and using these complex features for the classification task.