Ontology-driven web-based semantic similarity

Authors:
David Sánchez;Montserrat Batet;Aida Valls;Karina Gibert
Affiliations:
Department of Computer Science and Mathematics, Universitat Rovira i Virgili (URV), Tarragona, Spain 43007;Department of Computer Science and Mathematics, Universitat Rovira i Virgili (URV), Tarragona, Spain 43007;Department of Computer Science and Mathematics, Universitat Rovira i Virgili (URV), Tarragona, Spain 43007;Department of Statistics and Operations Research, Universitat Politècnica de Catalunya, Barcelona, Spain 08034
Venue:
Journal of Intelligent Information Systems
Year:
2010

Citing 18
Cited 18

Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Swoogle: a search and metadata engine for the semantic web

Proceedings of the thirteenth ACM international conference on Information and knowledge management
A semantic concordance

HLT '93 Proceedings of the workshop on Human Language Technology
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications

Ontology Learning and Population from Text: Algorithms, Evaluation and Applications
Ontology Matching

Ontology Matching
Measures of semantic similarity and relatedness in the biomedical domain

Journal of Biomedical Informatics
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
Locating complex named entities in web text

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Domain Ontology Learning from the Web

Domain Ontology Learning from the Web
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain

KSEM '09 Proceedings of the 3rd International Conference on Knowledge Science, Engineering and Management
Processing natural language without natural language processing

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

Ontology-enriched multi-document summarization in disaster management

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Semantic Clustering Using Multiple Ontologies

Proceedings of the 2010 conference on Artificial Intelligence Research and Development: Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence
Learning relation axioms from text: An automatic Web-based approach

Expert Systems with Applications: An International Journal
Enabling semantic similarity estimation across multiple ontologies: An evaluation in the biomedical domain

Journal of Biomedical Informatics
Ontology-based semantic similarity: A new feature-based approach

Expert Systems with Applications: An International Journal
Privacy protection of textual attributes through a semantic-based masking method

Information Fusion
Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective

Journal of Biomedical Informatics
The Semantic Service Search Engine (S3E)

Journal of Intelligent Information Systems
An information content based partitioning method for the anatomical ontology matching task

Proceedings of the Third Symposium on Information and Communication Technology
Semantically-grounded construction of centroids for datasets with textual attributes

Knowledge-Based Systems
Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines

Information Sciences: an International Journal
Preventing automatic user profiling in Web 2.0 applications

Knowledge-Based Systems
A semantic similarity method based on information content exploiting multiple ontologies

Expert Systems with Applications: An International Journal
Semantic similarity estimation from multiple ontologies

Applied Intelligence
Detecting sensitive information from textual documents: an information-theoretic approach

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
An automatic approach for ontology-based feature extraction from heterogeneous textualresources

Engineering Applications of Artificial Intelligence
A language for end-user web augmentation: Caring for producers and consumers alike

ACM Transactions on the Web (TWEB)
Semantic similarity measurement using historical google search patterns

Information Systems Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Estimation of the degree of semantic similarity/distance between concepts is a very common problem in research areas such as natural language processing, knowledge acquisition, information retrieval or data mining. In the past, many similarity measures have been proposed, exploiting explicit knowledge--such as the structure of a taxonomy--or implicit knowledge--such as information distribution. In the former case, taxonomies and/or ontologies are used to introduce additional semantics; in the latter case, frequencies of term appearances in a corpus are considered. Classical measures based on those premises suffer from some problems: in the first case, their excessive dependency of the taxonomical/ontological structure; in the second case, the lack of semantics of a pure statistical analysis of occurrences and/or the ambiguity of estimating concept statistical distribution from term appearances. Measures based on Information Content (IC) of taxonomical concepts combine both approaches. However, they heavily depend on a properly pre-tagged and disambiguated corpus according to the ontological entities in order to compute accurate concept appearance probabilities. This limits the applicability of those measures to other ontologies ---like specific domain ontologies- and massive corpus ---like the Web-. In this paper, several of the presented issues are analyzed. Modifications of classical similarity measures are also proposed. They are based on a contextualized and scalable version of IC computation in the Web by exploiting taxonomical knowledge. The goal is to avoid the measures' dependency on the corpus pre-processing to achieve reliable results and minimize language ambiguity. Our proposals are able to outperform classical approaches when using the Web for estimating concept probabilities.