Named entity recognition an aid to improve multilingual entity filling in language-independent approach

Authors:
Mahathi Bhagavatula;Santosh GSK;Vasudeva Varma
Affiliations:
International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India;International Institute of Information Technology, Hyderabad, India
Venue:
Proceedings of the first workshop on Information and knowledge management for developing region
Year:
2012

Citing 5
Cited 0

Internal and external evidence in the identification and semantic categorization of proper names

Corpus processing for lexical acquisition
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Evaluation of an algorithm for the recognition and classification of proper names

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
The NYU system for MUC-6 or where's the syntax?

MUC6 '95 Proceedings of the 6th conference on Message understanding
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper details the approach to identify Named Entities (NEs) from a large non-English corpus and associate them with appropriate tags, requiring minimal human intervention and no linguistic expertise. The main objective in this paper is to focus on Indian languages like Telugu, Hindi, Tamil, Marathi, etc., which are considered to be resource-poor languages when compared to English. The inherent structure of Wikipedia was exploited in developing an efficient co-occurrence frequency based NE identification algorithm for Indian Languages. We describe the methods by which English Wikipedia data can be used to bootstrap the identification of NEs in other languages which generates a list of NE's. Later, the paper focuses on utilizing this NE list to improve multilingual Entity Filling which showed promising results. On a dataset of 2,622 Marathi Wikipedia articles, with around 10,000 NEs manually tagged, an F-Measure of 81.25% was achieved by our system without availing language expertise. Similarly, an F-measure of 80.42% was achieved on around 12,000 NEs tagged within 2,935 Hindi Wikipedia articles.