Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
WordNet: a lexical database for English
Communications of the ACM
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
Document clustering with cluster refinement and model selection capabilities
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Centroid-Based Document Classification: Analysis and Experimental Results
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Document classification through interactive supervision of document and term labels
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Semi-supervised graph clustering: a kernel approach
ICML '05 Proceedings of the 22nd international conference on Machine learning
Isoperimetric Graph Partitioning for Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Isoperimetric Partitioning: A New Algorithm for Graph Partitioning
SIAM Journal on Scientific Computing
Document clustering with prior knowledge
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Text clustering with extended user feedback
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating User Provided Constraints into Document Clustering
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Text Mining: Classification, Clustering, and Applications
Text Mining: Classification, Clustering, and Applications
Text classification by labeling words
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Modern Information Retrieval
Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization
Information Sciences: an International Journal
Hi-index | 0.00 |
Document clustering plays an important role in text analytics by finding natural groupings of documents based on their similarity determined by the words appearing in them. Many of the clustering algorithms accessible through various text analytics tools are completely unsupervised in nature. That is, they are unable to incorporate any domain knowledge that might be available about the documents to improve the clustering accuracy and relevance. The authors present a graph partitioning based semi-supervised document clustering algorithm. The user provides knowledge about few of the documents in the form of "must-link" and "cannot-link" constraints between pairs of documents. A "must-link" constraint between two documents expresses the fact that the user feels that the two corresponding documents must be clustered irrespective of their dissimilarity. Similarly, a "cannot-link" signifies that the two documents should never be clustered together no matter how similar they might happen to be. These constraints are then incorporated into a graph partitioning based into a computationally efficient document clustering algorithm. Through experiments performed on publicly available text datasets, the proposed framework is validated.