An Algorithm that Learns What‘s in a Name
Machine Learning - Special issue on natural language learning
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Named Entity recognition without gazetteers
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Detecting errors in part-of-speech annotation
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Inducing multilingual text analysis tools via robust projection across aligned corpora
HLT '01 Proceedings of the first international conference on Human language technology research
Automatic acquisition of named entity tagged corpus from world wide web
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 2
NLTK: the Natural Language Toolkit
ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
The multilingual entity task (MET) overview
TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Language independent NER using a maximum entropy tagger
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A high-performance semi-supervised learning method for text chunking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Unsupervised Multilingual Sentence Boundary Detection
Computational Linguistics
Identifying Document Topics Using the Wikipedia Category Network
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Autonomously semantifying wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Measuring article quality in wikipedia: models and evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
YAGO: A Large Ontology from Wikipedia and WordNet
Web Semantics: Science, Services and Agents on the World Wide Web
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Learning to Tag and Tagging to Learn: A Case Study on Wikipedia
IEEE Intelligent Systems
Information arbitrage across multi-lingual Wikipedia
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Cross-lingual alignment and completion of Wikipedia templates
CLIAWS3 '09 Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
WikiRelate! computing semantic relatedness using wikipedia
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Comparison between tagged corpora for the named entity task
CompareCorpora '00 Proceedings of the Workshop on Comparing Corpora
Analysing Wikipedia and gold-standard corpora for NER training
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A simple semi-supervised algorithm for named entity recognition
SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
One class per named entity: exploiting unlabeled text for named entity recognition
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Large-scale taxonomy mapping for restructuring and integrating wikipedia
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Domain adaptive bootstrapping for named entity recognition
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Named entity recognition in Wikipedia
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
A Wikipedia-based multilingual retrieval model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
BabelNet: building a very large multilingual semantic network
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Annotating large email datasets for named entity recognition with Mechanical Turk
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
A hybrid model for annotating named entity training corpora
LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
MENTA: inducing multilingual taxonomies from wikipedia
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Classifying Wikipedia entities into fine-grained classes
ICDEW '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering Workshops
Learning from partially annotated sequences
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Training a named entity recognizer on the web
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Automatic assignment of wikipedia encyclopedic entries to wordnet synsets
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Active learning with Amazon Mechanical Turk
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Applying wikipedia's multilingual knowledge to cross-lingual question answering
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Web 2.0, Language Resources and standards to automatically build a multilingual Named Entity Lexicon
Language Resources and Evaluation
Collective information extraction using first-order probabilistic models
Proceedings of the Fifth Balkan Conference in Informatics
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
Hi-index | 0.00 |
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work overcomes. We first classify each Wikipedia article into named entity (ne) types, training and evaluating on 7200 manually-labelled Wikipedia articles across nine languages. Our cross-lingual approach achieves up to 95% accuracy. We transform the links between articles into ne annotations by projecting the target article@?s classifications onto the anchor text. This approach yields reasonable annotations, but does not immediately compete with existing gold-standard data. By inferring additional links and heuristically tweaking the Wikipedia corpora, we better align our automatic annotations to gold standards. We annotate millions of words in nine languages, evaluating English, German, Spanish, Dutch and Russian Wikipedia-trained models against conll shared task data and other gold-standard corpora. Our approach outperforms other approaches to automatic ne annotation (Richman and Schone, 2008 [61], Mika et al., 2008 [46]) competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.