Recall-oriented learning of named entities in Arabic Wikipedia

Authors:
Behrang Mohit;Nathan Schneider;Rishav Bhowmick;Kemal Oflazer;Noah A. Smith
Affiliations:
Carnegie Mellon University, Doha, Qatar;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Doha, Qatar;Carnegie Mellon University, Doha, Qatar;Carnegie Mellon University, Pittsburgh, PA
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 33
Cited 1

MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
CHINERS: a Chinese named entity recognition system for the sports domain

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Supersense tagging of unknown nouns in WordNet

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Exploiting domain structure for named entity recognition

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Structure compilation: trading structure for features

Proceedings of the 25th international conference on Machine learning
Arabic Named Entity Recognition from Diverse Text Types

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Arabic Natural Language Processing

Arabic Natural Language Processing
ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Arabic morphological tagging, diacritization, and lemmatization using lexeme models and feature ranking

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Analysing Wikipedia and gold-standard corpora for NER training

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Improving machine translation quality with automatic named entity recognition

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Arabic named entity recognition using optimized feature sets

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
NER systems that suit user's preferences: adjusting the recall-precision trade-off for entity extraction

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Domain adaptive bootstrapping for named entity recognition

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Named entity recognition in Wikipedia

People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Softmax-margin CRFs: training log-linear models with cost functions

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Simplified feature set for Arabic named entity recognition

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Improving mention detection robustness to noisy input

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Uptraining for accurate deterministic question parsing

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised discovery of domain-specific knowledge from text

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
That's what she said: double entendre identification

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview

LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Customizing an information extraction system to a new domain

RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Passage retrieval for incorporating global evidence in sequence labeling

Proceedings of the 20th ACM international conference on Information and knowledge management
Improving question answering using named entity recognition

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Coarse lexical semantic annotation with supersenses: an Arabic case study

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of NER in Arabic Wikipedia, a semisupervised domain adaptation setting for which we have no labeled training data in the target domain. To facilitate evaluation, we obtain annotations for articles in four topical groups, allowing annotators to identify domain-specific entity types in addition to standard categories. Standard supervised learning on newswire text leads to poor target-domain recall. We train a sequence model and show that a simple modification to the online learner---a loss function encouraging it to "arrogantly" favor recall over precision---substantially improves recall and F1. We then adapt our model with self-training on unlabeled target-domain data; enforcing the same recall-oriented bias in the self-training stage yields marginal gains.