A machine learning approach to coreference resolution of noun phrases
Computational Linguistics - Special issue on computational anaphora resolution
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Improving machine learning approaches to coreference resolution
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Machine Learning
ODIN: A Model for Adapting and Enriching Legacy Infrastructure
E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing
Prototype-driven grammar induction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Predicting Structured Data (Neural Information Processing)
Predicting Structured Data (Neural Information Processing)
Sound and efficient inference with probabilistic and deterministic dependencies
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Joint unsupervised coreference resolution with Markov logic
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Joint inference in information extraction
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Parsing, projecting & prototypes: repurposing linguistic data on the web
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session
LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Language identification: the long and the short of the matter
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Language identification for creating language-specific Twitter collections
LSM '12 Proceedings of the Second Workshop on Language in Social Media
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Hi-index | 0.01 |
As the arm of NLP technologies extends beyond a small core of languages, techniques for working with instances of language data across hundreds to thousands of languages may require revisiting and recalibrating the tried and true methods that are used. Of the NLP techniques that has been treated as "solved" is language identification (language ID) of written text. However, we argue that language ID is far from solved when one considers input spanning not dozens of languages, but rather hundreds to thousands, a number that one approaches when harvesting language data found on the Web. We formulate language ID as a coreference resolution problem and apply it to a Web harvesting task for a specific linguistic data type and achieve a much higher accuracy than long accepted language ID approaches.