A fuzzy document retrieval system using the keyword connection matrix and a learning method
Fuzzy Sets and Systems - Special issue on applications of fuzzy systems theory, Iizuka '88
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Concept decompositions for large sparse text data using clustering
Machine Learning
Machine Learning
Unsupervised document classification using sequential information maximization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering with cluster refinement and model selection capabilities
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced word clustering for hierarchical text classification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Phrase-based Document Similarity Based on an Index Graph Model
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Document clustering by concept factorization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Categorizing and Extracting Information from Multilingual HTML Documents
IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Enhancing text clustering by leveraging Wikipedia semantics
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Using word clusters to detect similar web documents
KSEM'06 Proceedings of the First international conference on Knowledge Science, Engineering and Management
A comparative study on text clustering methods
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Hi-index | 0.00 |
RSS news articles that are either partially or completely duplicated in content are easily found on the Internet these days, which require Web users to sort through the articles to identify non-redundant information. This manual-filtering process is time-consuming and tedious. In this paper, we present a new filtering and clustering approach, called FICUS, which starts with identifying and eliminating redundant RSS news articles using a fuzzy set information retrieval approach and then clusters the remaining non-redundant RSS news articles according to their degrees of resemblance. FICUS uses a tree hierarchy to organize clusters of RSS news articles. The contents of the respective clusters are captured by the representative keywords from RSS news articles in the clusters so that searching and retrieval of similar RSS news articles is fast and efficient. FICUS is simple, since it uses the pre-defined word-correlation factors to determine related (words in) RSS news articles and filter redundant ones, and is supported by well-known and yet simple mathematical models, such as the standard deviation, vector space model, and probability theory, to generate clusters of non-redundant RSS news articles. Experiments performed on (test sets of) RSS news articles on various topics, which were downloaded from different online sources, verify the accuracy of FICUS on eliminating redundant RSS news articles, clustering similar RSS news articles together, and segregating different RSS news articles in terms of their contents. In addition, further empirical studies show that FICUS outperforms well-known approaches adopted for clustering RSS news articles.