Improving a state-of-the-art named entity recognition system using the world wide web

Authors:
Richárd Farkas;György Szarvas;Róbert Ormándi
Affiliations:
University of Szeged, Department of Informatics, Szeged, Hungary;University of Szeged, Department of Informatics, Szeged, Hungary and Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary;Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary
Venue:
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Year:
2007

Citing 10
Cited 3

The Strength of Weak Learnability

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Named Entity Extraction using AdaBoost

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Named entity recognition through classifier combination

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms

DS'06 Proceedings of the 9th international conference on Discovery Science

Special semi-supervised techniques for natural language processing tasks

CIMMACS'07 Proceedings of the 6th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Piggyback: using search engines for robust cross-domain named entity recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Training a named entity recognizer on the web

WISE'11 Proceedings of the 12th international conference on Web information system engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

The development of highly accurate Named Entity Recognition (NER) systems can be beneficial to a wide range of Human Language Technology applications. In this paper we introduce three heuristics that exploit a variety of knowledge sources (the World Wide Web, Wikipedia and WordNet) and are capable of improving further a state-of-the-art multilingual and domain independent NER system. Moreover we describe our investigations on entity recognition in simulated speech-to-text output. Our web-based heuristics attained a slight improvement over the best results published on a standard NER task, and proved to be particularly effective in the speech-to-text scenario.