CP/CV: concept similarity mining without frequency information from domain describing taxonomies

Authors:
Jong Wook Kim;K. Sel#231/uk Candan
Affiliations:
Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 22
Cited 17

A qualitative biochemistry and its application to the regulation of the tryptophan operon

Artificial intelligence and molecular biology
Ranking schemes in hybrid Boolean systems: a new approach

Journal of the American Society for Information Science
Application of Spreading Activation Techniques in InformationRetrieval

Artificial Intelligence Review
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Extended Boolean information retrieval

Communications of the ACM
Contextual correlates of synonymy

Communications of the ACM
Modern Information Retrieval

Modern Information Retrieval
Determining Semantic Similarity among Entity Classes from Different Ontologies

IEEE Transactions on Knowledge and Data Engineering
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
XRANK: ranked keyword search over XML documents

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Similarity between words computed by spreading activation on an English dictionary

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Word sense disambiguation using Conceptual Density

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
A study of relevance propagation for web search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Bidirectional expansion for keyword search on graph databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Discovering mappings in hierarchical data from multiple sources using the inherent structure

Knowledge and Information Systems
XSEarch: a semantic search engine for XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Improving web data annotations with spreading activation

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering

Using tagflake for condensing navigable tag hierarchies from tag clouds

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Weighted Ontology for Semantic Search

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
CoSeNa: a context-based search and navigation system

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Organization and Tagging of Blog and News Entries Based on Content Reuse

Journal of Signal Processing Systems
Reducing metadata complexity for faster table summarization

Proceedings of the 13th International Conference on Extending Database Technology
ANITA: a narrative interpretation of taxonomies for their adaptation to text collections

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Unraveling multi-dimensional data using pDView

Proceedings of the 14th International Conference on Extending Database Technology
Navigating within news collections using tag-flakes

Journal of Visual Languages and Computing
Concept vector for semantic similarity and relatedness based on WordNet structure

Journal of Systems and Software
Editorial: Narrative-based taxonomy distillation for effective indexing of text collections

Data & Knowledge Engineering
On context-aware co-clustering with metadata support

Journal of Intelligent Information Systems
Measuring structural similarity of semistructured data based on information-theoretic approaches

The VLDB Journal — The International Journal on Very Large Data Bases
Similarity Measures for Substituting Web Services

International Journal of Web Services Research
Semantic search for matching user requests with profiled enterprises

Computers in Industry
NeMa: fast graph search with label similarity

Proceedings of the VLDB Endowment
Hierarchical co-clustering: off-line and incremental approaches

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domain specific ontologies are heavily used in many applications. For instance, these form the bases on which similarity/dissimilarity between keywords are extracted for various knowledge discovery and retrieval tasks. Existing similarity computation schemes can be categorized as (a) structure- or (b) information-based approaches. Structure based approaches compute dissimilarity between keywords using a (weighted) count of edges between two keywords. Information-base approaches, on the other hand, leverage available corpora to extract additional information, such as keyword frequency, to achieve better performance in similarity computation than structure-based approaches. Unfortunately, in many application domains (such as applications that rely on unique-keys in a relational database), frequency information required by information-based approaches does not exist. In this paper, we note that there is a third way of computing similarity: if each node in a given hierarchy can be represented as a vector of related concepts, these vectors could be compared to compute similarities. This requires mapping concept-nodes in a given hierarchy onto a concept space. In this paper, we propose a concept propagation (CP) scheme, which relies on the semantical relationships between concepts implied by the structure of the hierarchy to annotate each concept-node with a concept-vector (CV). We refer to this approach as CP/CV. Comparison of keyword similarity results shows that CP/CV provides significantly better (upto 33%) results than existing structure-based schemes. Also, even if CP/CV does not assume the availability of an appropriate corpus to extract keyword frequency information, our approach matches (and slightly improves on) the performance of information-based approaches.