An empirical study of active learning with support vector machines for Japanese word segmentation

Authors:
Manabu Sassano
Affiliations:
Fujitsu Laboratories Ltd., Kamikodanaka, Nakahara-ku, Kawasaki, Japan
Venue:
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Year:
2002

Citing 18
Cited 10

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Chunking with support vector machines

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Use of support vector learning for chunk identification

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Japanese dependency structure analysis based on support vector machines

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Sample selection for statistical grammar induction

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Deterministic Japanese word segmentation by decision list method

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence

Virtual examples for text classification with Support Vector Machines

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Confidence-Based Active Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
An active approach to spoken language processing

ACM Transactions on Speech and Language Processing (TSLP)
Mining relational data from text: From strictly supervised to weakly supervised learning

Information Systems
Adapting svm for data sparseness and imbalance: A case study in information extraction

Natural Language Engineering
Active learning with confidence

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Using smaller constituents rather than sentences in active learning for Japanese dependency parsing

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Bucking the trend: large-scale cost-focused active learning for statistical machine translation

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Phrase-based statistical language generation using graphical models and active learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Pointwise prediction for robust, adaptable Japanese morphological analysis

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0% accuracy, the proposed technique needs 59.3% of labeled examples that are required when using the previous technique and only 17.4% of labeled examples with random sampling.