The vocabulary problem in human-system communication
Communications of the ACM
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Ontologies Improve Text Document Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Dictionary-based techniques for cross-language information retrieval
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Empirical methods for compound splitting
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A Wikipedia-based multilingual retrieval model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Term weighting schemes for Latent Dirichlet Allocation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Handling noisy queries in cross language FAQ retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A late fusion approach to cross-lingual document re-ranking
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Dual-space re-ranking model for document retrieval
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Combining heterogeneous knowledge resources for improved distributional semantic models
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part I
Taxonomy induction based on a collaboratively built knowledge repository
Artificial Intelligence
What Makes a Phone a Business Phone - Querying Concepts in Product Data
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Insights into explicit semantic analysis
Proceedings of the 20th ACM international conference on Information and knowledge management
Combining wikipedia-based concept models for cross-language retrieval
IRFC'10 Proceedings of the First international Information Retrieval Facility conference on Adbances in Multidisciplinary Retrieval
Cross-language information retrieval with latent topic models trained on a comparable corpus
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Exploiting Wikipedia for cross-lingual and multilingual information retrieval
Data & Knowledge Engineering
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Detecting highly confident word translations from comparable corpora without any prior knowledge
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Collaboratively built semi-structured content and Artificial Intelligence: The story so far
Artificial Intelligence
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia
Artificial Intelligence
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Development and evaluation of a biomedical search engine using a predicate-based vector space model
Journal of Biomedical Informatics
Querying concepts in product data by means of query expansion
Web Intelligence and Agent Systems
Hi-index | 0.00 |
The field of information retrieval and text manipulation (classification, clustering) still strives for models allowing semantic information to be folded in to improve performance with respect to standard bag-of-word based models. Many approaches aim at a concept-based retrieval, but differ in the nature of the concepts, which range from linguistic concepts as defined in lexical resources such as WordNet, latent topics derived from the data itself - as in Latent Semantic Indexing (LSI) or (Latent Dirichlet Allocation (LDA) - to Wikipedia articles as proxies for concepts, as in the recently proposed Explicit Semantic Analysis (ESA) model. A crucial question which has not been answered so far is whether models based on explicitly given concepts (as in the ESA model for instance) perform inherently better than retrieval models based on "latent" concepts (as in LSI and/or LDA). In this paper we investigate this question closer in the context of a cross-language setting, which inherently requires concept-based retrieval bridging between different languages. In particular, we compare the recently proposed ESA model with two latent models (LSI and LDA) showing that the former is clearly superior to the both. From a general perspective, our results contribute to clarifying the role of explicit vs. implicitly derived or latent concepts in (cross-language) information retrieval research.