A Wikipedia-based multilingual retrieval model

Authors:
Martin Potthast;Benno Stein;Maik Anderka
Affiliations:
Bauhaus University Weimar, Faculty of Media, Weimar, Germany;Bauhaus University Weimar, Faculty of Media, Weimar, Germany;Bauhaus University Weimar, Faculty of Media, Weimar, Germany
Venue:
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Year:
2008

Citing 6
Cited 47

Cross-lingual relevance models

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language information retrieval: a dictionary approach

Resolving ambiguity for cross-language information retrieval: a dictionary approach
Dictionary-based techniques for cross-language information retrieval

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Principles of hash-based text retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Strategies for retrieving plagiarized documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

A statistical approach to crosslingual natural language tasks

Journal of Algorithms
The ESA retrieval model revisited

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Explicit versus latent concept models for cross-language information retrieval

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A study on the semantic relatedness of query and document terms in information retrieval

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
WikiTranslate: query translation for cross-lingual information retrieval using only Wikipedia

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Crosslanguage retrieval based on Wikipedia statistics

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
WikiPics: multilingual image search based on Wiki-mining

Proceedings of the 6th International Symposium on Wikis and Open Collaboration
Cross-language text classification using structural correspondence learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MARS: a MultilAnguage Recommender System

Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems
A late fusion approach to cross-lingual document re-ranking

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Plagiarism detection across distant language pairs

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Cross-language personalization through a semantic content-based recommender system

AIMSA'10 Proceedings of the 14th international conference on Artificial intelligence: methodology, systems, and applications
Evaluating cross-language explicit semantic analysis and cross querying

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
A self-supervised approach for extraction of attribute-value pairs from wikipedia articles

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Cross lingual text classification by mining multilingual topics from wikipedia

Proceedings of the fourth ACM international conference on Web search and data mining
Dual-space re-ranking model for document retrieval

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Cross-language plagiarism detection

Language Resources and Evaluation
Concept-Based Information Retrieval Using Explicit Semantic Analysis

ACM Transactions on Information Systems (TOIS)
Language resources extracted from Wikipedia

Proceedings of the sixth international conference on Knowledge capture
Cross-Lingual Adaptation Using Structural Correspondence Learning

ACM Transactions on Intelligent Systems and Technology (TIST)
Cross-language information filtering: word sense disambiguation vs. distributional models

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Cross-lingual recommendations in a resource-based learning scenario

EC-TEL'11 Proceedings of the 6th European conference on Technology enhanced learning: towards ubiquitous learning
Learning semantic content-based profiles for cross-language recommendations

Proceedings of the First Workshop on Personalised Multilingual Hypertext Retrieval
Insights into explicit semantic analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
Multilingual schema matching for Wikipedia infoboxes

Proceedings of the VLDB Endowment
Cross-language high similarity search: why no sub-linear time bound can be expected

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Combining wikipedia-based concept models for cross-language retrieval

IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
An experimental comparison of explicit semantic analysis implementations for cross-language retrieval

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
A breakdown of quality flaws in Wikipedia

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Cross-lingual knowledge linking across wiki knowledge bases

Proceedings of the 21st international conference on World Wide Web
Exploiting Wikipedia for cross-lingual and multilingual information retrieval

Data & Knowledge Engineering
Mining market trend from blog titles based on lexical semantic similarity

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Learning a concept-based document similarity measure

Journal of the American Society for Information Science and Technology
Explanatory semantic relatedness and explicit spatialization for exploratory search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Author disambiguation using wikipedia-based explicit semantic analysis

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Translation techniques in cross-language information retrieval

ACM Computing Surveys (CSUR)
On the connections between explicit semantic analysis and latent semantic analysis

Proceedings of the 21st ACM international conference on Information and knowledge management
Cross-Language high similarity search using a conceptual thesaurus

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Learning multilingual named entity recognition from Wikipedia

Artificial Intelligence
Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval

Journal of the American Society for Information Science and Technology
BiCWS: mining cognitive differences from bilingual web search results

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Boosting cross-lingual knowledge linking via concept annotation

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Visualizing large-scale human collaboration in Wikipedia

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces CL-ESA, a new multilingual retrieval model for the analysis of cross-language similarity. The retrieval model exploits the multilingual alignment of Wikipedia: given a document d written in language L we construct a concept vector d for d, where each dimension i in d quantifies the similarity of d with respect to a document di* chosen from the "L-subset" of Wikipedia. Likewise, for a second document d′ written in language L′, L ≠ L′, we construct a concept vector d′, using from the L′-subset of the Wikipedia the topic-aligned counterparts d′i* of our previously chosen documents. Since the two concept vectors d and d′ are collection-relative representations of d and d′ they are language-independent. I. e., their similarity can directly be computed with the cosine similarity measure, for instance. We present results of an extensive analysis that demonstrates the power of this new retrieval model: for a query document d the topically most similar documents from a corpus in another language are properly ranked. Salient property of the new retrieval model is its robustness with respect to both the size and the quality of the index document collection.