Interactive and context-aware tag spell check and correction

Authors:
Francesco Bonchi;Ophir Frieder;Franco Maria Nardini;Fabrizio Silvestri;Hossein Vahabi
Affiliations:
Yahoo! Research, Barcelona, Spain;Georgetown University, Washington, DC, USA;ISTI-CNR, Pisa, Italy;ISTI-CNR, Pisa, Italy;ISTI-CNR, Pisa, Italy
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 9
Cited 0

Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Automatic Rule Acquisition for Spelling Correction

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
On multiword entity ranking in peer-to-peer search

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Using the web for language independent spellchecking and autocorrection

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A large scale ranker-based system for search query spelling correction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
On tag spell checking

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
A graph approach to spelling correction in domain-centric search

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collaborative content creation and annotation creates vast repositories of all sorts of media, and user-defined tags play a central role as they are a simple yet powerful tool for organizing, searching and exploring the available resources. We observe that when a user annotates a resource with a set of tags, those tags are introduced one at a time. Therefore, when the fourth tag is introduced, a knowledge represented by the previous three tags, i.e., the context in which the fourth tag is produced, is available and exploitable for generating potential correction of the current tag. This context, together with the "wisdom of the crowd" represented by the co-occurrences of tags in all the resources of the repository, can be exploited to provide interactive tag spell check and correction. We develop this idea in a framework, based on a weighted tag co-occurrence graph and on nodes relatedness measures defined on weighted neighborhoods. We test our proposal on a dataset coming from YouTube. The results show that our framework is effective as it outperforms two important baselines. We also show that it is efficient, thus enabling its use in modern tagging services.