A web content mining approach for tag cloud generation

Authors:
Muhammad Abulaish;Tarique Anwar
Affiliations:
King Saud University, Riyadh, Saudi Arabia;King Saud University, Riyadh, Saudi Arabia
Venue:
Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Year:
2011

Citing 18
Cited 0

A trainable document summarizer

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Phrasier: a system for interactive document retrieval using keyphrases

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
Improving browsing in digital libraries with keyphrase indexes

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Learning Algorithms for Keyphrase Extraction

Information Retrieval
OIL: An Ontology Infrastructure for the Semantic Web

IEEE Intelligent Systems
Information search and re-access strategies of experienced web users

WWW '05 Proceedings of the 14th international conference on World Wide Web
Thesaurus based automatic keyphrase indexing

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Tag clouds for summarizing web search results

Proceedings of the 16th international conference on World Wide Web
The folksonomy tag cloud: when is it useful?

Journal of Information Science
Web Document Clustering by Using Automatic Keyphrase Extraction

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
Real-time automatic tag recommendation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Data clouds: summarizing keyword search results over structured data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Semantically structured tag clouds: an empirical evaluation of clustered presentation approaches

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Human-competitive tagging using automatic keyphrase extraction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tag cloud, also known as word cloud, are very useful for quickly perceiving the most prominent terms embedded within a text collection to determine their relative prominence. The effectiveness of tag clouds to conceptualize a text corpus is directly proportional to the quality of the keyphrases extracted from the corpus. Although, authors provide a list of about five to ten keywords in scientific publications that are used to map them into their respective domain, due to exponential growth in non-scientific documents on the World Wide Web, an automatic mechanism is sought to identify keyphrases embedded within them for tag cloud generation. In this paper, we propose a web content mining technique to extract keyphrases from web documents for tag cloud generation. Instead of using partial or full parsing, the proposed method applies n-gram technique followed by various heuristics-based refinements to identify a set of lexical and semantic features from text documents. We propose a rich set of domain-independent features to model candidate keyphrases very effectively for establishing their keyphraseness using classification models. We also propose a font-determination function to determine the relative font-size of keyphrases for tag cloud generation. The efficacy of the proposed method is established through experimentation. The proposed method outperforms the popular keyphrase extraction system KEA.