Cross lingual semantic search by improving semantic similarity and relatedness measures

Authors:
Nitish Aggarwal
Affiliations:
Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Venue:
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part II
Year:
2012

Citing 16
Cited 1

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings

Information Retrieval
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
WikiTranslate: query translation for cross-lingual information retrieval using only Wikipedia

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Cross-lingual latent topic extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A feature and information theoretic framework for semantic similarity and relatedness

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Querying linked data using semantic relatedness: a vocabulary independent approach

NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Insights into explicit semantic analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
PowerAqua: fishing the semantic web

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
PowerMap: mapping the real semantic web on the fly

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
An experimental comparison of explicit semantic analysis implementations for cross-language retrieval

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
DERI&UPM: pushing corpus based relatedness to similarity: shared task system description

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Robust question answering over the web of linked data

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since 2001, the semantic web community has been working hard towards creating standards which will increase the accessibility of available information on the web. Yahoo research recently reported that 30% of all HTML pages contain structured data such as microdata, RDFa, or microformat. Although multilinguality of the web is a hurdle in information access, the rapid growth of the semantic web enables us to retrieve fine grained information across the language barrier. In this thesis, firstly, we focus on developing a methodology to perform cross-lingual semantic search over structured data (knowledge base), by transforming natural language queries into SPARQL. Secondly, we focus on improving the semantic similarity and relatedness measures, to overcome the semantic gap between the vocabulary in the knowledge base and the terms appearing in the query. The preliminary results are evaluated against the QALD-2 test dataset, which achieved a F1 score of 0.46, an average precision of 0.44, and an average recall of 0.48.