Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Authors:
Majid Yazdani;Andrei Popescu-Belis
Affiliations:
Idiap Research Institute, 1920 Martigny, Switzerland and EPFL, ícole Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland;Idiap Research Institute, 1920 Martigny, Switzerland
Venue:
Artificial Intelligence
Year:
2013

Citing 55
Cited 3

CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Machine Learning

Machine Learning
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
Latent dirichlet allocation

The Journal of Machine Learning Research
The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Semantic cores for representing documents in IR

Proceedings of the 2005 ACM symposium on Applied computing
Query expansion using random walk models

Proceedings of the 14th ACM international conference on Information and knowledge management
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Learning semantic classes for word sense disambiguation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A semantic approach to IE pattern induction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Dependency-Based Construction of Semantic Space Models

Computational Linguistics
A Graph Modeling of Semantic Similarity between Words

ICSC '07 Proceedings of the International Conference on Semantic Computing
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Semi-supervised learning of compact document representations with deep networks

Proceedings of the 25th international conference on Machine learning
Fast incremental proximity search in large graphs

Proceedings of the 25th international conference on Machine learning
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
Query suggestion using hitting time

Proceedings of the 17th ACM conference on Information and knowledge management
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
A structural approach to the automatic adjudication of word sense disagreements

Natural Language Engineering
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Using encyclopedic knowledge for automatic topic identification

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Non-classical lexical semantic relations

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Personalizing PageRank for word sense disambiguation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Using wiktionary for computing semantic relatedness

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
A study on similarity and relatedness using distributional and WordNet-based approaches

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Knowledge derived from wikipedia for computing semantic relatedness

Journal of Artificial Intelligence Research
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Hierarchical-Hyperspherical Divisive Fuzzy C-Means (H2D-FCM) Clustering for Information Retrieval

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Explicit versus latent concept models for cross-language information retrieval

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Cross-lingual semantic relatedness using encyclopedic knowledge

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Random walks for text semantic similarity

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
WikiWalk: random walks on Wikipedia for semantic relatedness

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
A Random Walk Framework to Compute Textual Semantic Similarity: A Unified Model for Three Benchmark Tasks

ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Concept-Based Information Retrieval Using Explicit Semantic Analysis

ACM Transactions on Information Systems (TOIS)
Unsupervised word sense disambiguation with lexical chains and graph-based context formalization

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Taxonomy induction based on a collaboratively built knowledge repository

Artificial Intelligence
Learning to Rank for Information Retrieval and Natural Language Processing

Learning to Rank for Information Retrieval and Natural Language Processing
Learning discriminative projections for text similarity measures

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning

Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia: extended abstract

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a method for computing semantic relatedness between words or texts by using knowledge from hypertext encyclopedias such as Wikipedia. A network of concepts is built by filtering the encyclopedia@?s articles, each concept corresponding to an article. Two types of weighted links between concepts are considered: one based on hyperlinks between the texts of the articles, and another one based on the lexical similarity between them. We propose and implement an efficient random walk algorithm that computes the distance between nodes, and then between sets of nodes, using the visiting probability from one (set of) node(s) to another. Moreover, to make the algorithm tractable, we propose and validate empirically two truncation methods, and then use an embedding space to learn an approximation of visiting probability. To evaluate the proposed distance, we apply our method to four important tasks in natural language processing: word similarity, document similarity, document clustering and classification, and ranking in information retrieval. The performance of the method is state-of-the-art or close to it for each task, thus demonstrating the generality of the knowledge resource. Moreover, using both hyperlinks and lexical similarity links improves the scores with respect to a method using only one of them, because hyperlinks bring additional real-world knowledge not captured by lexical similarity.