Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Analysis of anchor text for web search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Cluster Hypothesis Revisited
The Cluster Hypothesis Revisited
The Journal of Machine Learning Research
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Hourly analysis of a very large topically categorized web query log
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Concept-Driven Algorithm for Clustering Search Results
IEEE Intelligent Systems
Improved annotation of the blogosphere via autotagging and hierarchical clustering
Proceedings of the 15th international conference on World Wide Web
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Diversifying the image retrieval results
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Optimizing web search using social annotations
Proceedings of the 16th international conference on World Wide Web
Can social bookmarking enhance search in the web?
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Towards automatic extraction of event and place semantics from flickr tags
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Tracking and summarizing news on a daily basis with Columbia's Newsblaster
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Can social bookmarking improve web search?
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Exploring social annotations for information retrieval
Proceedings of the 17th international conference on World Wide Web
Introduction to Information Retrieval
Introduction to Information Retrieval
Information retrieval in folksonomies: search and ranking
ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
The value of socially tagged urls for a search engine
Proceedings of the 18th international conference on World wide web
Annotation of URLs: more than the sum of parts
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Getting the most out of social annotations for web page classification
Proceedings of the 9th ACM symposium on Document engineering
A cluster-based approach to XML similarity joins
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Evidence of quality of textual features on the web 2.0
Proceedings of the 18th ACM conference on Information and knowledge management
Compressing tags to find interesting media groups
Proceedings of the 18th ACM conference on Information and knowledge management
Exploit the tripartite network of social tagging for web clustering
Proceedings of the 18th ACM conference on Information and knowledge management
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
I tag, you tag: translating tags for advanced user models
Proceedings of the third ACM international conference on Web search and data mining
Community-based ranking of the social web
Proceedings of the 21st ACM conference on Hypertext and hypermedia
The topic-perspective model for social tagging systems
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic model for personalized tag prediction
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Term weighting schemes for Latent Dirichlet Allocation
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
PTM: probabilistic topic mapping model for mining parallel document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Exploiting tag and word correlations for improved webpage clustering
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Survey on social tagging techniques
ACM SIGKDD Explorations Newsletter
Context-aware basic level concepts detection in folksonomies
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Demand-driven tag recommendation
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Journal of Information Science
Clustering the tagged resources using STAC
WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
#TwitterSearch: a comparison of microblog search and web search
Proceedings of the fourth ACM international conference on Web search and data mining
Improving social bookmark search using personalised latent variable language models
Proceedings of the fourth ACM international conference on Web search and data mining
Tag recommendation based on Bayesian principle
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Improving Recommender Systems by Incorporating Social Contextual Information
ACM Transactions on Information Systems (TOIS)
Hierarchical comments-based clustering
Proceedings of the 2011 ACM Symposium on Applied Computing
Tags vs shelves: from social tagging to social classification
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
User-related tag expansion for web document clustering
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Tags in domain-specific sites: new information?
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A hierarchical model of web summaries
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Typology of mixed-membership models: towards a design method
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Measuring social tag confidence: is it a good or bad tag?
WAIM'11 Proceedings of the 12th international conference on Web-age information management
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Exploring categorization property of social annotations for information retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Simultaneous joint and conditional modeling of documents tagged from two perspectives
Proceedings of the 20th ACM international conference on Information and knowledge management
Evaluating tag filtering techniques for web resource classification in folksonomies
Expert Systems with Applications: An International Journal
Tag-aware recommender systems: a state-of-the-art survey
Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Tripartite community structure in social bookmarking data
The New Review of Hypermedia and Multimedia - Special issue on Social Linking and Hypermedia
Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Web Page Clustering
ACM Transactions on Intelligent Systems and Technology (TIST)
Opportunistic social dissemination of micro-blogs
Ad Hoc Networks
Methodologies for improved tag cloud generation with clustering
ICWE'12 Proceedings of the 12th international conference on Web Engineering
SSHLDA: a semi-supervised hierarchical topic model
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hierarchical topic integration through semi-supervised hierarchical topic modeling
Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic labeling hierarchical topics
Proceedings of the 21st ACM international conference on Information and knowledge management
Assessing the quality of textual features in social media
Information Processing and Management: an International Journal
Parameter-less co-clustering for star-structured heterogeneous data
Data Mining and Knowledge Discovery
Translating related words to videos and back through latent topics
Proceedings of the sixth ACM international conference on Web search and data mining
Connecting comments and tags: improved modeling of social tagging systems
Proceedings of the sixth ACM international conference on Web search and data mining
Clustering tagged documents with labeled and unlabeled documents
Information Processing and Management: an International Journal
Semi-Supervised Latent Dirichlet Allocation and Its Application for Document Classification
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Document Re-ranking Using Partial Social Tagging
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Domain-dependent/independent topic switching model for online reviews with numerical ratings
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Comment-based multi-view clustering of web 2.0 items
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from large-scale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper explores the use of tags in 1) K-means clustering in an extended vector space model that includes tags as well as page text and 2) a novel generative clustering algorithm based on latent Dirichlet allocation that jointly models text and tags. We evaluate the models by comparing their output to an established web directory. We find that the naive inclusion of tagging data improves cluster quality versus page text alone, but a more principled inclusion can substantially improve the quality of all models with a statistically significant absolute F-score increase of 4%. The generative model outperforms K-means with another 8% F-score increase.