MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Bootstrapping POS taggers using unlabelled data
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
CHINERS: a Chinese named entity recognition system for the sports domain
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Supersense tagging of unknown nouns in WordNet
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Exploiting domain structure for named entity recognition
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Structure compilation: trading structure for features
Proceedings of the 25th international conference on Machine learning
Arabic Named Entity Recognition from Diverse Text Types
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Arabic Natural Language Processing
Arabic Natural Language Processing
ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Biomedical named entity recognition using conditional random fields and rich feature sets
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Analysing Wikipedia and gold-standard corpora for NER training
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Improving machine translation quality with automatic named entity recognition
EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Arabic named entity recognition using optimized feature sets
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Domain adaptive bootstrapping for named entity recognition
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Named entity recognition in Wikipedia
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Softmax-margin CRFs: training log-linear models with cost functions
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Simplified feature set for Arabic named entity recognition
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Improving mention detection robustness to noisy input
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Uptraining for accurate deterministic question parsing
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised discovery of domain-specific knowledge from text
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
That's what she said: double entendre identification
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Proposal for an extension of traditional named entities: from guidelines to evaluation, an overview
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Customizing an information extraction system to a new domain
RELMS '11 Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Passage retrieval for incorporating global evidence in sequence labeling
Proceedings of the 20th ACM international conference on Information and knowledge management
Improving question answering using named entity recognition
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Coarse lexical semantic annotation with supersenses: an Arabic case study
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Hi-index | 0.00 |
We consider the problem of NER in Arabic Wikipedia, a semisupervised domain adaptation setting for which we have no labeled training data in the target domain. To facilitate evaluation, we obtain annotations for articles in four topical groups, allowing annotators to identify domain-specific entity types in addition to standard categories. Standard supervised learning on newswire text leads to poor target-domain recall. We train a sequence model and show that a simple modification to the online learner---a loss function encouraging it to "arrogantly" favor recall over precision---substantially improves recall and F1. We then adapt our model with self-training on unlabeled target-domain data; enforcing the same recall-oriented bias in the self-training stage yields marginal gains.