Finding similar objects using a taxonomy: a pragmatic approach

  • Authors:
  • Peter Schwarz;Yu Deng;Julia E. Rice

  • Affiliations:
  • IBM Almaden Research Center, San Jose, CA;IBM Thomas J Watson Research Center, Yorktown Heights, NY;IBM Almaden Research Center, San Jose, CA

  • Venue:
  • ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several authors have suggested similarity measures for objects labeled with terms from a hierarchical taxonomy We generalize this idea with a definition of information-theoretic similarity for taxonomies that are structured as directed acyclic graphs from which multiple terms may be used to describe an object We discuss how our definition should be adapted in the presence of ambiguity, and introduce new similarity measures based on our definitions. We present an implementation of our measures that is integrated with a relational database and scales to large taxonomies and datasets We evaluate our measures by applying them to an object-matching problem from bioinformatics, and show that, for this task, our new measures outperform those reported in the literature We also verified the scalability of our approach by applying it to patent similarity search, using patents classified with terms from the taxonomy defined by the United States Patent and Trademark Office.