Algorithmic detection of semantic similarity

Authors:
Ana G. Maguitman;Filippo Menczer;Heather Roinestad;Alessandro Vespignani
Affiliations:
Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN
Venue:
WWW '05 Proceedings of the 14th international conference on World Wide Web
Year:
2005

Citing 16
Cited 55

Fuzzy mathematical techniques with applications

Fuzzy mathematical techniques with applications
Introduction to algorithms

Introduction to algorithms
Lexical analysis and stoplists

Information retrieval
Elements of information theory

Elements of information theory
A translation approach to portable ontology specifications

Knowledge Acquisition - Special issue: Current issues in knowledge modeling
Information storage and retrieval

Information storage and retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Evaluating strategies for similarity search on the web

Proceedings of the 11th international conference on World Wide Web
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Exploiting hierarchical domain structure to compute similarity

ACM Transactions on Information Systems (TOIS)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Node similarity in networked information spaces

CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Combining link and content analysis to estimate semantic similarity

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Mapping the Semantics of Web Text and Links

IEEE Internet Computing
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

Mapping the Semantics of Web Text and Links

IEEE Internet Computing
GiveALink: mining a semantic network of bookmarks for web search and recommendation

Proceedings of the 3rd international workshop on Link discovery
Probabilistic models for discovering e-communities

Proceedings of the 15th international conference on World Wide Web
Emerging semantic communities in peer web search

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Automatic computation of semantic proximity using taxonomic knowledge

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
CP/CV: concept similarity mining without frequency information from domain describing taxonomies

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Language model-based document clustering using random walks

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Structural similarity in geographical queries to improve query answering

Proceedings of the 2007 ACM symposium on Applied computing
The use of semantic-based predicates implication to improve horizontal multimedia database fragmentation

Workshop on multimedia information retrieval on The many faces of multimedia semantics
Dynamic semantic retrieval space reconstruction for WWW environments

ACST'07 Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient assembly of social semantic networks
A Hybrid Approach for XML Similarity

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Weighted Ontology for Semantic Search

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
A semi-supervised incremental algorithm to automatically formulate topical queries

Information Sciences: an International Journal
Evaluating similarity measures for emergent semantics of social tagging

Proceedings of the 18th international conference on World wide web
A session based personalized search using an ontological user profile

Proceedings of the 2009 ACM symposium on Applied Computing
Analysis of tag within online social networks

Proceedings of the ACM 2009 international conference on Supporting group work
Multi-Agent Based Web Search with Heterogeneous Semantics

Agent Computing and Multi-Agent Systems
Context-based literature digital collection search

The VLDB Journal — The International Journal on Very Large Data Bases
A generic framework for comparing semantic similarities on a subsumption hierarchy

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Leveraging sources of collective wisdom on the web for discovering technology synergies

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Bookmark hierarchies and collaborative recommendation

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
OSS: a semantic similarity function based on hierarchical ontologies

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Extending the similarity-based XML multicast approach with digital signatures

Proceedings of the 2009 ACM workshop on Secure web services
Comparison of similarity measures for clustering Turkish documents

Intelligent Data Analysis
Towards a graph-based user profile modeling for a session-based personalized search

Knowledge and Information Systems
Multi-objective Query Optimization Using Topic Ontologies

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Extensible User-Based XML Grammar Matching

ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Part of Speech (POS) Tag Sets Reduction and Analysis Using Rough Set Techniques

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Labeling categories and relationships in an evolving social network

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Expressive and flexible access to web-extracted data: a keyword-based structured query language

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Adaptive combination of tag and link-based user similarity in flickr

Proceedings of the international conference on Multimedia
Toward approximate GML retrieval based on structural and semantic characteristics

ICWE'10 Proceedings of the 10th international conference on Web engineering
A service concept recommendation system for enhancing the dependability of semantic service matchmakers in the service ecosystem environment

Journal of Network and Computer Applications
A graph-based approach to measuring semantic relatedness in ontologies

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Combining file content and file relations for cloud based malware detection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering of rough set related documents with use of knowledge from DBpedia

RSKT'11 Proceedings of the 6th international conference on Rough sets and knowledge technology
Discovering Relevant Topics Using DBPedia: Providing Non-obvious Recommendations

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Linked-data based suggestion of relevant topics

Proceedings of the 7th International Conference on Semantic Systems
Evaluating semantic similarity using GML in geographic information systems

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
Query approximation by semantic similarity in GeoPQL

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part II
From folksologies to ontologies: how the twain meet

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Finding similar objects using a taxonomy: a pragmatic approach

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Web Semantics: Science, Services and Agents on the World Wide Web
Survey: An overview on XML similarity: Background, current trends and future directions

Computer Science Review
Minimizing user effort in XML grammar matching

Information Sciences: an International Journal
TakeLab: systems for measuring semantic text similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Finding related papers in literature digital libraries

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Semantic search for matching user requests with profiled enterprises

Computers in Industry
Semantics-based information extraction for detecting economic events

Multimedia Tools and Applications
Semantic to intelligent web era: building blocks, applications, and current trends

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Semantic web services publication and OCT-based discovery in structured P2P network

Service Oriented Computing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic measures is limited by the coverage of user studies, which do not scale with the size, heterogeneity, and growth of the Web. Here we propose to leverage human-generated metadata --- namely topical directories --- to measure semantic relationships among massive numbers of pairs of Web pages or topics. The Open Directory Project classifies millions of URLs in a topical ontology, providing a rich source from which semantic relationships between Web pages can be derived. While semantic similarity measures based on taxonomies (trees) are well studied, the design of well-founded similarity measures for objects stored in the nodes of arbitrary ontologies (graphs) is an open problem. This paper defines an information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology. An experimental study shows that this measure improves significantly on the traditional taxonomy-based approach. This novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good agreement with semantic similarity. Surprisingly, the traditional use of text similarity turns out to be ineffective for relevance ranking.