Generalized vector spaces model in information retrieval
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The ESA retrieval model revisited
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Concept-based feature generation and selection for information retrieval
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Explicit versus latent concept models for cross-language information retrieval
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A Wikipedia-based multilingual retrieval model
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Semantically enhanced term frequency
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
A breakdown of quality flaws in Wikipedia
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Exploiting Wikipedia for cross-lingual and multilingual information retrieval
Data & Knowledge Engineering
Cross lingual semantic search by improving semantic similarity and relatedness measures
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Improving ESA with document similarity
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.01 |
Since its debut the Explicit Semantic Analysis (ESA) has received much attention in the IR community. ESA has been proven to perform surprisingly well in several tasks and in different contexts. However, given the conceptual motivation for ESA, recent work has observed unexpected behavior. In this paper we look at the foundations of ESA from a theoretical point of view and employ a general probabilistic model for term weights which reveals how ESA actually works. Based on this model we explain some of the phenomena that have been observed in previous work and support our findings with new experiments. Moreover, we provide a theoretical grounding on how the size and the composition of the index collection affect the ESA-based computation of similarity values for texts.