Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Introduction to probabilistic automata (Computer science and applied mathematics)
Introduction to probabilistic automata (Computer science and applied mathematics)
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Accelerated training of conditional random fields with stochastic gradient methods
ICML '06 Proceedings of the 23rd international conference on Machine learning
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A systematic comparison of feature-rich probabilistic classifiers for NER tasks
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Named-entity recognition for polish with SProUT
IMTCI'04 Proceedings of the Second international conference on Intelligent Media Technology for Communicative Intelligence
Hi-index | 0.00 |
In this paper we analyse the importance of data generalisation and usage of local context in the problem of the Proper Name recognition. We present an extended set of features that provide generalised description of the data and encode linguistic information. To utilize the rich set of features we applied Conditional Random Fields (CRF) -- a modern approach for sequence labelling. We present results of the evaluation on a single domain following the cross-validation scheme and cross-domain evaluation based on training and testing on different corpora. We show that the extended set of features improves the final results for CRF and also this approach outperforms Hidden Markov Models (HMM). On the single domain CRF obtained 92.53% of F-measure for 5 categories of proper names, and 67.72% and 72.62% of F-measure for other two corpora in cross-domain evaluation.