An overview of web data clustering practices

Authors:
Athena Vakali;Jaroslav Pokorný;Theodore Dalamagas
Affiliations:
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Faculty of Mathematics and Physics, Charles University, Praha 1, Czech Republic;School of Electr and Comp Engineering, National Technical University of Athens, Zographou, Athens, Greece
Venue:
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Year:
2004

Citing 17
Cited 17

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A flexible model for retrieval of SGML documents

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Link prediction and path analysis using Markov chains

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Integrating contents and structure in text retrieval

ACM SIGMOD Record
XIRQL: a query language for information retrieval in XML documents

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
BitCube: A Three-Dimensional Bitmap Indexing for XML Documents

Journal of Intelligent Information Systems
Self-Organization and Identification of Web Communities

Computer
The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Correlation-based Document Clustering using Web Logs

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 5 - Volume 5
Optimal Algorithms for Finding User Access Sessions from Very Large Web Logs

World Wide Web
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Data Mining and Knowledge Discovery
Untangling compound documents on the web

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Web Communities: Models and Algorithms

World Wide Web
Hierarchical indexing and flexible element retrieval for structured document

ECIR'03 Proceedings of the 25th European conference on IR research

Similarity-Based Fuzzy Clustering for User Profiling

WI-IATW '07 Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops
A clustering-based prefetching scheme on a Web cache environment

Computers and Electrical Engineering
Mining usage profiles from access data using fuzzy clustering

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
Web User Profiling Using Fuzzy Clustering

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
The SKM Algorithm: A K-Means Algorithm for Clustering Sequential Data

IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Computational Intelligence techniques for Web personalization

Web Intelligence and Agent Systems
Web Page Clustering via Partition Adaptive Affinity Propagation

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
NIVA: a Robust cluster validity

ICCOM'08 Proceedings of the 12th WSEAS international conference on Communications
An extended branch and bound search algorithm for finding top-N formal concepts of documents

JSAI'06 Proceedings of the 20th annual conference on New frontiers in artificial intelligence
The sustainability of corporate wikis: A time-series analysis of activity patterns

ACM Transactions on Management Information Systems (TMIS)
A comparison of internal and external cluster validation indexes

AMERICAN-MATH'11/CEA'11 Proceedings of the 2011 American conference on applied mathematics and the 5th WSEAS international conference on Computer engineering and applications
XML data clustering: An overview

ACM Computing Surveys (CSUR)
A Clustering-Driven LDAP Framework

ACM Transactions on the Web (TWEB)
A method for pinpoint clustering of web pages with pseudo-clique search

Proceedings of the 2005 international conference on Federation over the Web
A fast implementation of the EM algorithm for mixture of multinomials

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Finding significant web pages with lower ranks by pseudo-clique search

DS'05 Proceedings of the 8th international conference on Discovery Science
Scalable sequence similarity search and join in main memory on multi-cores

Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is a challenging topic in the area of Web data management Various forms of clustering are required in a wide range of applications, including finding mirrored Web pages, detecting copyright violations, and reporting search results in a structured way Clustering can either be performed once offline, (independently to search queries), or online (on the results of search queries) Important efforts have focused on mining Web access logs and to cluster search engine results on the fly Online methods based on link structure and text have been applied successfully to finding pages on related topics This paper presents an overview of the most popular methodologies and implementations in terms of clustering either Web users or Web sources and presents a survey about current status and future trends in clustering employed over the Web.