Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Efficient implementation of suffix trees
Software—Practice & Experience
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
When information retrieval measures agree about the relative quality of document rankings
Journal of the American Society for Information Science
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Phrase-based Document Similarity Based on an Index Graph Model
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Computational dialectology in Irish Gaelic
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
A Concept-Driven Algorithm for Clustering Search Results
IEEE Intelligent Systems
Improving Web Clustering by Cluster Selection
WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
A new suffix tree similarity measure for document clustering
Proceedings of the 16th international conference on World Wide Web
Query Directed Web Page Clustering
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
The Google Similarity Distance
IEEE Transactions on Knowledge and Data Engineering
Search Results Clustering in Chinese Context Based on a New Suffix Tree
CITWORKSHOPS '08 Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Efficient Phrase-Based Document Similarity for Clustering
IEEE Transactions on Knowledge and Data Engineering
A survey of Web clustering engines
ACM Computing Surveys (CSUR)
Universal Mobile Information Retrieval
UAHCI '09 Proceedings of the 5th International on ConferenceUniversal Access in Human-Computer Interaction. Part II: Intelligent and Ubiquitous Interaction Environments
A New Suffix Tree Similarity Measure and Labeling for Web Search Results Clustering
ICETET '09 Proceedings of the 2009 Second International Conference on Emerging Trends in Engineering & Technology
Hi-index | 0.00 |
Retrieving relevant information from web, containing enormous amount of data, is a highly complicated research area. A landmark research that contributes to this area is web clustering which efficiently organizes a large amount of web documents into a small number of meaningful and coherent groups[1,2]. Various techniques aim at accurately categorizing the web pages into clusters automatically. Suffix Tree Clustering (STC) is a phrase-based, state-of-art algorithm for web clustering that automatically groups semantically related documents based on shared phrases. Research has shown that it has outperformed other clustering algorithms such as K-means and Buckshot due to its efficient utilization of phrases to identify the clusters. Using STC as the baseline, we introduce a new method for ranking base clusters and new similarity measures for comparing clusters. Our STHAC technique combines the Heirarchical Agglomerative clustering method with phrase based Suffix Tree clustering to improve the cluster merging process. Experimental results have shown that STHAC outperforms the original STC as well as ESTC(our precious extended version of STC) with 16% increase in F-measure. This increase in F-measure of STHAC is achieved due to its better filtering of low score clusters, better similarity measures and efficient cluster merging algorithms.