Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
The Journal of Machine Learning Research
Shallow parsing using specialized hmms
The Journal of Machine Learning Research
Text chunking based on a generalization of winnow
The Journal of Machine Learning Research
Chunking with support vector machines
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Chunk-based statistical translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Filtering-Ranking Perceptron Learning for Partial Parsing
Machine Learning
Introduction to the CoNLL-2000 shared task: chunking
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A high-performance semi-supervised learning method for text chunking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Automatic measurement of syntactic development in child language
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A general and multi-lingual phrase chunking model based on masking method
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Robust and efficient multiclass SVM models for phrase pattern recognition
Pattern Recognition
Hi-index | 0.00 |
Automatic text chunking is a task which aims to recognize phrase structures in natural language text. It is the key technology of knowledge-based system where phrase structures provide important syntactic information for knowledge representation. Support Vector Machine (SVM-based) phrase chunking system had been shown to achieve high performance for text chunking. But its inefficiency limits the actual use on large dataset that only handles several thousands tokens per second. In this paper, we firstly show that the state-of-the-art performance (94.25) in the CoNLL-2000 shared task based on conventional SVM learning. However, the off-the-shelf SVM classifiers are inefficient when the number of phrase types scales to high. Therefore, we present two novel methods that make the system substantially faster in terms of training and testing while only results in a slightly decrease of system performance. Experimental result shows that our method achieves 94.09 in F rate, which handles 13000 tokens per second in the CoNLL-2000 chunking task.