Coupling knowledge-based and data-driven systems for named entity recognition

Authors:
Damien Nouvel;Jean-Yves Antoine;Nathalie Friburger;Arnaud Soulet
Affiliations:
Université François Rabelais Tours, Blois, France;Université François Rabelais Tours, Blois, France;Université François Rabelais Tours, Blois, France;Université François Rabelais Tours, Blois, France
Venue:
HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Year:
2012

Citing 11
Cited 0

Using finite state automata for sequence mining

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A maximum entropy approach to named entity recognition

A maximum entropy approach to named entity recognition
Finite-state transducer cascades to extract named entities in texts

Theoretical Computer Science - Implementation and application automata
Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Robust named entity extraction from large spoken archives

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Practical very large scale CRFs

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Scikit-learn: Machine Learning in Python

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Within Information Extraction tasks, Named Entity Recognition has received much attention over latest decades. From symbolic / knowledge-based to data-driven / machine-learning systems, many approaches have been experimented. Our work may be viewed as an attempt to bridge the gap from the data-driven perspective back to the knowledge-based one. We use a knowledge-based system, based on manually implemented transducers, that reaches satisfactory performances. It has the undisputable advantage of being modular. However, such a hand-crafted system requires substantial efforts to cope with dedicated tasks. In this context, we implemented a pattern extractor that extracts symbolic knowledge, using hierarchical sequential pattern mining over annotated corpora. To assess the accuracy of mined patterns, we designed a module that recognizes Named Entities in texts by determining their most probable boundaries. Instead of considering Named Entity Recognition as a labeling task, it relies on complex context-aware features provided by lower-level systems and considers the tagging task as a markovian process. Using thos systems, coupling knowledge-based system with extracted patterns is straightforward and leads to a competitive hybrid NE-tagger. We report experiments using this system and compare it to other hybridization strategies along with a baseline CRF model.