Bridging languages by SuperSense entity tagging

Authors:
Davide Picca;Alfio Massimiliano Gliozzo;Simone Campora
Affiliations:
University of Lausanne, Lausanne-Switzerland;Semantic Technology Lab (STLab - ISTC - CNR), Rome, Italy;Ecole Polytechnique Federale de Lausanne (EPFL)
Venue:
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Year:
2009

Citing 8
Cited 1

Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Supersense tagging of unknown nouns in WordNet

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
WordNet Nouns: Classes and Instances

Computational Linguistics
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications

Ontology Learning and Population from Text: Algorithms, Evaluation and Applications
Evaluating cross-language annotation transfer in the MultiSemCor corpus

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Comparisons of sequence labeling algorithms and extensions

Proceedings of the 24th international conference on Machine learning
Semantic domains and supersense tagging for domain-specific ontology learning

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)

Coarse lexical semantic annotation with supersenses: an Arabic case study

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores a very basic linguistic phenomenon in multilingualism: the lexicalizations of entities are very often identical within different languages while concepts are usually lexicalized differently. Since entities are commonly referred to by proper names in natural language, we measured their distribution in the lexical overlap of the terminologies extracted from comparable corpora. Results show that the lexical overlap is mostly composed by unambiguous words, which can be regarded as anchors to bridge languages: most of terms having the same spelling refer exactly to the same entities. Thanks to this important feature of Named Entities, we developed a multilingual super sense tagging system capable to distinguish between concepts and individuals. Individuals adopted for training have been extracted both by YAGO and by a heuristic procedure. The general F1 of the English tagger is over 76%, which is in line with the state of the art on super sense tagging while augmenting the number of classes. Performances for Italian are slightly lower, while ensuring a reasonable accuracy level which is capable to show effective results for knowledge acquisition.