Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Elements of information theory
Elements of information theory
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A corpus-based approach to comparative evaluation of statistical term association measures
Journal of the American Society for Information Science and Technology
Information Retrieval
Modern Information Retrieval
Subject Analysis on Online Catalogs
Subject Analysis on Online Catalogs
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Algorithms
An information retrieval model based on vector space method by supervised learning
Information Processing and Management: an International Journal
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Pharos: a scalable distributed architecture for locating heterogeneous information sources
Pharos: a scalable distributed architecture for locating heterogeneous information sources
Similarity-based methods for word sense disambiguation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Predicting library of congress classifications from library of congress subject headings
Journal of the American Society for Information Science and Technology
Query-sensitive similarity measures for information retrieval
Knowledge and Information Systems
Web Information Retrieval in Collaborative Tagging Systems
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Can social bookmarking improve web search?
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Introduction to Information Retrieval
Introduction to Information Retrieval
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
Characterization and evaluation of similarity measures for pairs of clusterings
Knowledge and Information Systems
The language of folksonomies: what tags reveal about user classification
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Conceptual syntagmatic associations in user tagging
Journal of the American Society for Information Science and Technology
Social book search: comparing topical relevance judgements and book suggestions for evaluation
Proceedings of the 21st ACM international conference on Information and knowledge management
A methodology for folksonomy evaluation
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Hi-index | 0.00 |
Social tagging or collaborative tagging has become a new trend in the organization, management, and discovery of digital information. The rapid growth of shared information mostly controlled by social tags poses a new challenge for social tag-based information organization and retrieval. A plausible approach for this challenge is linking social tags to a controlled vocabulary. As an introductory step for this approach, this study investigates ways of predicting relevant subject headings for resources from social tags assigned to the resources. The prediction of subject headings was measured by five different similarity measures: tf–idf, cosine-based similarity (CoS), Jaccard similarity (or Jaccard coefficient; JS), Mutual information (MI), and information radius (IRad). Their results were compared to those by professionals. The results show that a CoS measure based on top five social tags was most effective. Inclusions of more social tags only aggravate the performance. The performance of JS is comparable to the performance of CoS while tf–idf is comparable with up to 70% less than the best performance. MI and IRad have inferior performance compared to the other methods. This study demonstrates the application of the similarity measuring techniques to the prediction of correct Library of Congress subject headings. © 2010 Wiley Periodicals, Inc.