Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Computational Linguistics
Selective sampling for example-based word sense disambiguation
Computational Linguistics
Minimizing manual annotation cost in supervised training from corpora
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
The LinGO Redwoods treebank motivation and preliminary applications
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Scaling to very very large corpora for natural language disambiguation
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Sample Selection for Statistical Parsing
Computational Linguistics
Japanese dependency analysis using cascaded chunking
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Linear-time dependency analysis for Japanese
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Selective sampling of training data for speech recognition
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Using smaller constituents rather than sentences in active learning for Japanese dependency parsing
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
This paper discusses sampling strategies for building a dependency-analyzed corpus and analyzes them with different kinds of corpora. We used the Kyoto Text Corpus, a dependency-analyzed corpus of newspaper articles, and prepared the IPAL corpus, a dependency-analyzed corpus of example sentences in dictionaries, as a new and different kind of corpus. The experimental results revealed that the length of the test set controlled the accuracy and that the longest-first strategy was good for an expanding corpus, but this was not the case when constructing a corpus from scratch.