Cross-language information filtering: word sense disambiguation vs. distributional models

Authors:
Cataldo Musto;Fedelucio Narducci;Pierpaolo Basile;Pasquale Lops;Marco de Gemmis;Giovanni Semeraro
Affiliations:
Department of Computer Science, University of Bari "Aldo Moro", Italy;Department of Computer Science, University of Bari "Aldo Moro", Italy;Department of Computer Science, University of Bari "Aldo Moro", Italy;Department of Computer Science, University of Bari "Aldo Moro", Italy;Department of Computer Science, University of Bari "Aldo Moro", Italy;Department of Computer Science, University of Bari "Aldo Moro", Italy
Venue:
AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Year:
2011

Citing 12
Cited 1

Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Sparse Distributed Memory

Sparse Distributed Memory
Improving User Modelling with Content-Based Techniques

UM '01 Proceedings of the 8th International Conference on User Modeling 2001
Orthogonal negation in vector spaces for modelling word-meanings and document retrieval

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Integrating tags in a semantic content-based recommender

Proceedings of the 2008 ACM conference on Recommender systems
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Content-based recommendation systems

The adaptive web
A Wikipedia-based multilingual retrieval model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Enhanced vector space models for content-based recommender systems

Proceedings of the fourth ACM conference on Recommender systems
Multilingual information filtering by human plausible reasoning

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments

A folksonomy-based recommender system for personalized access to digital artworks

Journal on Computing and Cultural Heritage (JOCCH)

Quantified Score

Hi-index	0.01

Visualization

Abstract

The exponential growth of the Web is the most influential factor that contributes to the increasing importance of text retrieval and filtering systems. Anyway, since information exists in many languages, users could also consider as relevant documents written in different languages from the one the query is formulated in. In this context, an emerging requirement is to sift through the increasing flood of multilingual text: this poses a renewed challenge for designing effective multilingual Information Filtering systems. How could we represent user information needs or user preferences in a language-independent way? In this paper, we compared two content-based techniques able to provide users with cross-language recommendations: the first one relies on a knowledge-based word sense disambiguation technique that uses Multi-WordNet as sense inventory, while the latter is based on a dimensionality reduction technique called Random Indexing and exploits the so-called distributional hypothesis in order to build language-independent user profiles. Since the experiments conducted in a movie recommendation scenario show the effectiveness of both approaches, we tried also to underline strenghts and weaknesses of each approach in order to identify scenarios in which a specific technique fits better.