A semantic similarity approach to predicting Library of Congress subject headings for social tags

Authors:
Kwan Yi
Affiliations:
School of Library and Information Science, University of Kentucky, 331 Little Library Building, Lexington, KY 40506-0224
Venue:
Journal of the American Society for Information Science and Technology
Year:
2010

Citing 22
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Elements of information theory

Elements of information theory
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A corpus-based approach to comparative evaluation of statistical term association measures

Journal of the American Society for Information Science and Technology
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Subject Analysis on Online Catalogs

Subject Analysis on Online Catalogs
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Introduction to Algorithms

Introduction to Algorithms
An information retrieval model based on vector space method by supervised learning

Information Processing and Management: an International Journal
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Pharos: a scalable distributed architecture for locating heterogeneous information sources

Pharos: a scalable distributed architecture for locating heterogeneous information sources
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Predicting library of congress classifications from library of congress subject headings

Journal of the American Society for Information Science and Technology
Query-sensitive similarity measures for information retrieval

Knowledge and Information Systems
Web Information Retrieval in Collaborative Tagging Systems

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Can social bookmarking improve web search?

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Introduction to Information Retrieval

Introduction to Information Retrieval
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Characterization and evaluation of similarity measures for pairs of clusterings

Knowledge and Information Systems
The language of folksonomies: what tags reveal about user classification

NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems

Conceptual syntagmatic associations in user tagging

Journal of the American Society for Information Science and Technology
Social book search: comparing topical relevance judgements and book suggestions for evaluation

Proceedings of the 21st ACM international conference on Information and knowledge management
A methodology for folksonomy evaluation

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social tagging or collaborative tagging has become a new trend in the organization, management, and discovery of digital information. The rapid growth of shared information mostly controlled by social tags poses a new challenge for social tag-based information organization and retrieval. A plausible approach for this challenge is linking social tags to a controlled vocabulary. As an introductory step for this approach, this study investigates ways of predicting relevant subject headings for resources from social tags assigned to the resources. The prediction of subject headings was measured by five different similarity measures: tf–idf, cosine-based similarity (CoS), Jaccard similarity (or Jaccard coefficient; JS), Mutual information (MI), and information radius (IRad). Their results were compared to those by professionals. The results show that a CoS measure based on top five social tags was most effective. Inclusions of more social tags only aggravate the performance. The performance of JS is comparable to the performance of CoS while tf–idf is comparable with up to 70% less than the best performance. MI and IRad have inferior performance compared to the other methods. This study demonstrates the application of the similarity measuring techniques to the prediction of correct Library of Congress subject headings. © 2010 Wiley Periodicals, Inc.