A new distance metric on strings computable in linear time
Discrete Applied Mathematics
Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
On the use of spreading activation methods in automatic information
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
An interface for navigating clustered document sets returned by queries
COCS '93 Proceedings of the conference on Organizational computing systems
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical methods for speech recognition
Statistical methods for speech recognition
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
A vector space model for automatic indexing
Communications of the ACM
Statistical Language Learning
Information Retrieval
Modern Information Retrieval
MARSYAS: a framework for audio analysis
Organised Sound
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
A suffix tree approach to anti-spam email filtering
Machine Learning
A semantics based information distribution framework for large web-based course forum system
ICWL'06 Proceedings of the 5th international conference on Advances in Web Based Learning
Real-time data pre-processing technique for efficient feature extraction in large scale datasets
Proceedings of the 17th ACM conference on Information and knowledge management
Performance evaluation of similarity join for real time information integration
Proceedings of the 2nd Bangalore Annual Compute Conference
Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
MatchSim: a novel neighbor-based similarity measure with maximum neighborhood matching
Proceedings of the 18th ACM conference on Information and knowledge management
PhraseRank for document clustering: reweighting the weight of phrase
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Ranking weak-linked documents on the web
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Web snippets clustering based on an improved suffix tree algorithm
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Wiki trust metrics based on phrasal analysis
WikiSym '08 Proceedings of the 4th International Symposium on Wikis
Generating advertising keywords from video content
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
IEEE Transactions on Information Technology in Biomedicine
Optimizing enterprise search by automatically relating user context to textual document content
i-KNOW '11 Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
ERA: efficient serial and parallel suffix tree construction for very long strings
Proceedings of the VLDB Endowment
ImpactWheel: Visual Analysis of the Impact of Online News
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Representing document as dependency graph for document clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Automatically structuring domain knowledge from text: An overview of current research
Information Processing and Management: an International Journal
Improving suffix tree clustering with new ranking and similarity measures
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Extracting data records from web using suffix tree
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
A Roadmap to Integrate Document Clustering in Information Retrieval
International Journal of Information Retrieval Research
Hi-index | 0.00 |
In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree similarity measure in Group-average Agglomerative Hierarchical Clustering (GAHC) algorithm, we developed a new suffix tree document clustering algorithm (NSTC). Experimental results on two standard document clustering benchmark corpus OHSUMED and RCV1 indicate that the new clustering algorithm is a very effective document clustering algorithm. Comparing with the results of traditional word term weight tf-idf similarity measure in the same GAHC algorithm, NSTC achieved an improvement of 51% on the average of F-measure score. Furthermore, we apply the new clustering algorithm in analyzing the Web documents in online forum communities. A topic oriented clustering algorithm is developed to help people in assessing, classifying and searching the the Web documents in a large forum community.