Self organizing maps in NLP: exploration of coreference feature space

Authors:
Andre Burkovski;Wiltrud Kessler;Gunther Heidemann;Hamidreza Kobdani;Hinrich Schütze
Affiliations:
University of Stuttgart, Stuttgart, Germany;University of Stuttgart, Stuttgart, Germany;University of Stuttgart, Stuttgart, Germany;Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany;Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany
Venue:
WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
Year:
2011

Citing 8
Cited 0

Memory-based shallow parsing

The Journal of Machine Learning Research
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Early lexical development in a self-organizing neural network

Neural Networks - 2004 Special issue: New developments in self-organizing systems
OntoNotes: A Unified Relational Semantic Representation

ICSC '07 Proceedings of the International Conference on Semantic Computing
BART: a modular toolkit for coreference resolution

HLT-Demonstrations '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session
Unsupervised models for coreference resolution

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
SUCRE: A modular system for coreference resolution

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Relational feature engineering of natural language processing

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In Natural Language Processing, large annotated data sets are needed to train language models using supervised machine learning methods. To obtain such labeled data sets, time consuming manual annotation is required. To facilitate this process, we propose a SOM-based approach: The SOM sorts the data through unsupervised training, mapping the space of linguistic features to a 2D-grid. The grid visualization is used for efficient interactive labeling of the data clusters. In addition, the interactive SOM visualization allows computational linguists to explore the topology of the feature space and design new features.