Ontology-Based word sense disambiguation for scientific literature

Authors:
Roman Prokofyev;Gianluca Demartini;Alexey Boyarsky;Oleg Ruchayskiy;Philippe Cudré-Mauroux
Affiliations:
eXascale Infolab, University of Fribourg, Switzerland;eXascale Infolab, University of Fribourg, Switzerland;Ecole Polytechnique Fédérale de Lausanne, Switzerland,Instituut-Lorentz for Theoretical Physics, U. Leiden, The Netherlands,Bogolyubov Institute for Theoretical Physics, Kiev, Ukraine;CERN TH-Division, PH-TH, Geneva, Switzerland;eXascale Infolab, University of Fribourg, Switzerland
Venue:
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Year:
2013

Citing 8
Cited 0

Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Decomposable modeling in natural language processing

Computational Linguistics
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Two birds with one stone: learning semantic models for text categorization and word sense disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
Knowledge-based and knowledge-lean methods combined in unsupervised word sense disambiguation

Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance

ACM Transactions on Speech and Language Processing (TSLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific documents often adopt a well-defined vocabulary and avoid the use of ambiguous terms. However, as soon as documents from different research sub-communities are considered in combination, many scientific terms become ambiguous as the same term can refer to different concepts from different sub-communities. The ability to correctly identify the right sense of a given term can considerably improve the effectiveness of retrieval models, and can also support additional features such as search diversification. This is even more critical when applied to explorative search systems within the scientific domain. In this paper, we propose novel semi-supervised methods to term disambiguation leveraging the structure of a community-based ontology of scientific concepts. Our approach exploits the graph structure that connects different terms and their definitions to automatically identify the correct sense that was originally picked by the authors of a scientific publication. Experimental evidence over two different test collections from the physics and biomedical domains shows that the proposed method is effective and outperforms state-of-the-art approaches based on feature vectors constructed out of term co-occurrences as well as standard supervised approaches.