The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
C4.5: programs for machine learning
C4.5: programs for machine learning
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Life, death, and lawfulness on the electronic frontier
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The quest for correct information on the Web: hyper search engines
Selected papers from the sixth international conference on World Wide Web
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
ACM Computing Surveys (CSUR)
Swarm intelligence
A clustering strategy based on a formalism of the reproductive process in natural systems
SIGIR '79 Proceedings of the 2nd annual international ACM SIGIR conference on Information storage and retrieval: information implications into the eighties
Introduction to Algorithms
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Fast hierarchical clustering and its validation
Data & Knowledge Engineering
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On Combining Link and Contents Information for Web Page Clustering
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Use Link-Based Clustering to Improve Web Search Results
WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Clustering web pages based on their structure
Data & Knowledge Engineering - Special issue: WIDM 2003
Clustering documents into a web directory for bootstrapping a supervised classification
Data & Knowledge Engineering - Special issue: WIDM 2003
Indexed-based density biased sampling for clustering applications
Data & Knowledge Engineering
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters
IEEE Transactions on Computers
DBRS: a density-based spatial clustering method with random sampling
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
AntClust: ant clustering and web usage mining
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Web page clustering: a hyperlink-based similarity and matrix-based hierarchical algorithms
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Hierarchical web-page clustering via in-page and cross-page link structures
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Hi-index | 0.00 |
The rapid increase of information on the web makes it necessary to improve information management techniques. One of the most important techniques is clustering web data. In this paper, we propose a new 3-phase clustering method that finds dense units in a data set using density-based algorithms. The distances in the dense units are stored in order in structures such as a min heap. In the extraction stage, these distances are extracted one by one, and their effects on the clustering process are examined. Finally, in the combination stage, clustering is completed using improved versions of well-known single and average linkage methods. All steps of the methods are performed in O(nlogn) time complexity. The proposed methods have the benefit of low complexity, and experimental results show they generate clusters with high quality. Other experiments also show that they provide additional advantages, such as clustering by sampling.