PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING

Authors:
Morteza Haghir Chehreghani;Mostafa Haghir Chehreghani;Hassan Abolhassani
Affiliations:
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran;Department of Computer Engineering, Sharif University of Technology, Tehran, Iran;Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Venue:
Computational Intelligence
Year:
2012

Citing 28
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Swarm intelligence

Swarm intelligence
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A clustering strategy based on a formalism of the reproductive process in natural systems

SIGIR '79 Proceedings of the 2nd annual international ACM SIGIR conference on Information storage and retrieval: information implications into the eighties
The enhanced LBG algorithm

Neural Networks
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Ontology Learning for the Semantic Web

IEEE Intelligent Systems
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Link Based Clustering of Web Search Results

WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
KAON - Towards a Large Scale Semantic Web

EC-WEB '02 Proceedings of the Third International Conference on E-Commerce and Web Technologies
On Combining Link and Contents Information for Web Page Clustering

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Bidirectional Hierarchical Clustering for Web Mining

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Link mining: a new data mining challenge

ACM SIGKDD Explorations Newsletter
Text Classification by Boosting Weak Learners based on Terms and Concepts

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Learning to integrate web taxonomies

Web Semantics: Science, Services and Agents on the World Wide Web
Semantic Web Mining

Web Semantics: Science, Services and Agents on the World Wide Web
AntClust: ant clustering and web usage mining

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Clustering distributed data streams in peer-to-peer environments

Information Sciences: an International Journal
No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Survey of clustering algorithms

IEEE Transactions on Neural Networks
Optimal adaptive k-means algorithm with dynamic adjustment of learning rate

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering Web data is one important technique for extracting knowledge from the Web. In this paper, a novel method is presented to facilitate the clustering. The method determines the appropriate number of clusters and provides suitable representatives for each cluster by inference from a Bayesian network. Furthermore, by means of the Bayesian network, the contents of the Web pages are converted into vectors of lower dimensions. The method is also extended for hierarchical clustering, and a useful heuristic is developed to select a good hierarchy. The experimental results show that the clusters produced benefit from high quality. (The value of this threshold is a subjective issue that depends on the human perceptions of relevancy, precision, and recall. It can be easily determined by some limited human-oriented examinations. © 2012 Wiley Periodicals, Inc.)