EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Supersense tagging of unknown nouns in WordNet
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
WordNet Nouns: Classes and Instances
Computational Linguistics
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications
Evaluating cross-language annotation transfer in the MultiSemCor corpus
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Comparisons of sequence labeling algorithms and extensions
Proceedings of the 24th international conference on Machine learning
Semantic domains and supersense tagging for domain-specific ontology learning
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Coarse lexical semantic annotation with supersenses: an Arabic case study
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Hi-index | 0.00 |
This paper explores a very basic linguistic phenomenon in multilingualism: the lexicalizations of entities are very often identical within different languages while concepts are usually lexicalized differently. Since entities are commonly referred to by proper names in natural language, we measured their distribution in the lexical overlap of the terminologies extracted from comparable corpora. Results show that the lexical overlap is mostly composed by unambiguous words, which can be regarded as anchors to bridge languages: most of terms having the same spelling refer exactly to the same entities. Thanks to this important feature of Named Entities, we developed a multilingual super sense tagging system capable to distinguish between concepts and individuals. Individuals adopted for training have been extracted both by YAGO and by a heuristic procedure. The general F1 of the English tagger is over 76%, which is in line with the state of the art on super sense tagging while augmenting the number of classes. Performances for Italian are slightly lower, while ensuring a reasonable accuracy level which is capable to show effective results for knowledge acquisition.