Combining proper name-coreference with conditional random fields for semi-supervised named entity recognition in Vietnamese text

Authors:
Rathany Chan Sam;Huong Thanh Le;Thuy Thanh Nguyen;Thien Huu Nguyen
Affiliations:
Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam;Hanoi University of Science and Technology, Hanoi, Vietnam
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 14
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A bootstrapping approach to named entity classification using successive learners

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Tagging of name records for genealogical data browsing

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Syntax-based semi-supervised named entity tagging

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Named entity recognition in Vietnamese using classifier voting

ACM Transactions on Asian Language Information Processing (TALIP)
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
A simple semi-supervised algorithm for named entity recognition

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
One class per named entity: exploiting unlabeled text for named entity recognition

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Semi-supervised learning for relation extraction in Vietnamese text

Proceedings of the Second Symposium on Information and Communication Technology
A hybrid approach of pattern extraction and semi-supervised learning for vietnamese named entity recognition

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
VNLP: an open source framework for Vietnamese natural language processing

Proceedings of the Fourth Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations.Most existingNERsystems are based on supervised learning. This method often requires a large amount of labelled training data, which is very time-consuming to build. To solve this problem, we introduce a semi-supervised learning method for recognizing named entities in Vietnamese text by combining proper name coreference, named-ambiguityheuristicswithapowerful sequential learningmodel,Conditional RandomFields. Our approach inherits the idea of Liao and Veeramachaneni [6] and expands it by using proper name coreference. Starting by training the model using a small data set that is annotated manually, the learning model extracts high confident named entities and finds low confident ones by using proper name coreference rules. The low confident named entities are put in the training set to learn new context features. The F-scores of the systemfor extracting "Person", "Location" and "Organization" entities are 83.36%, 69.53% and 65.71%when applying heuristics proposed by Liao andVeeramachaneni.Those valueswhen using our proposed heuristics are 93.13%, 88.15% and 79.35%, respectively. It shows that our method is good in increasing the system accuracy.