Think globally, apply locally: using distributional characteristics for Hindi named entity identification

Authors:
Shalini Gupta;Pushpak Bhattacharyya
Affiliations:
IIT Bombay, Mumbai, India;IIT Bombay, Mumbai, India
Venue:
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Year:
2010

Citing 9
Cited 1

Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Rapid development of Hindi named entity recognition using conditional random fields and feature induction

ACM Transactions on Asian Language Information Processing (TALIP)
Why inverse document frequency?

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Using term informativeness for named entity detection

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Organizing and searching the world wide web of facts -- step two: harnessing the wisdom of the crowds

Proceedings of the 16th international conference on World Wide Web
Semi-supervised sequence modeling with syntactic topic models

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Domain adaptation with latent semantic association for named entity recognition

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Topic-Oriented words as features for named entity recognition

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper, we present a novel approach for Hindi Named Entity Identification (NEI) in a large corpus. The key idea is to harness the global distributional characteristics of the words in the corpus. We show that combining the global distributional characteristics along with the local context information improves the NEI performance over statistical baseline systems that employ only local context. The improvement is very significant (about 10%) in scenarios where the test and train corpus belong to different genres. We also propose a novel measure for NEI based on term informativeness and show that it is competitive with the best measure and better than other well known information measures.