Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation

Authors:
Dan Tufiş;Svetla Koeva
Affiliations:
Research Institute for Artificial Intelligence, Romanian Academy, 13, "13 Septembrie", 050711, Bucharest, Romania;Institute for Bulgarian Language, Bulgarian Academy of Sciences, 52 Shipchenski prohod, 1113 Sofia, Bulgaria
Venue:
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Year:
2007

Citing 3
Cited 1

EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Towards a standard upper ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
An approach to clustering abstracts

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Cross-lingual word sense disambiguation for languages with scarce resources

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper reports on recent experiments in cross-lingual document processing (with a case study for Bulgarian-English-Romanian language pairs) and brings evidence on the benefits of using linguistic ontologies for achieving, with a high level of accuracy, difficult tasks in NLP such as word alignment, word sense disambiguation, document classification, cross-language information retrieval, etc. We provide brief descriptions of the parallel corpus we used, the multilingual lexical ontology which supports our research, the word alignment and word sense disambiguation systems we developed and a preliminary report on an ongoing development of a system for cross-lingual text-classification which takes advantage of these multilingual technologies. Unlike the keyword-based methods in document processing, the concept-based methods are supposed to better exploit the semantic information contained in a particular document and thus to provide more accurate results.