Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Automatic text processing
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Learning Approaches for Detecting and Tracking News Events
IEEE Intelligent Systems
Phrase-based Text Representation for Managing the Web Documents
ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
Mining massive document collections by the WEBSOM method
Information Sciences: an International Journal - Special issue: Soft computing data mining
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
A personalized search engine based on web-snippet hierarchical clustering
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Tracking and summarizing news on a daily basis with Columbia's Newsblaster
HLT '02 Proceedings of the second international conference on Human Language Technology Research
A Novelty-based Clustering Method for On-line Documents
World Wide Web
Beyond bags of words: effectively modeling dependence and features in information retrieval
Beyond bags of words: effectively modeling dependence and features in information retrieval
Content-based filtering in on-line social networks
PSDML'10 Proceedings of the international ECML/PKDD conference on Privacy and security issues in data mining and machine learning
Investigating the statistical properties of user-generated documents
FQAS'11 Proceedings of the 9th international conference on Flexible Query Answering Systems
Black hole: A new heuristic optimization approach for data clustering
Information Sciences: an International Journal
Hi-index | 0.10 |
Document clustering techniques have been applied in several areas, with the web as one of the most recent and influential. Both general-purpose and text-oriented techniques exist and can be used to cluster a collection of documents in many ways. This work proposes a novel heuristic online document clustering model that can be specialized with a variety of text-oriented similarity measures. An experimental evaluation of the proposed model was conducted in the e-commerce domain. Performances were measured using a clustering-oriented metric based on F-Measure and compared with those obtained by other well-known approaches. The obtained results confirm the validity of the proposed method both for batch scenarios and online scenarios where document collections can grow over time.