Algorithms for clustering data
Algorithms for clustering data
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Constant interaction-time scatter/gather browsing of very large document collections
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Almost-constant-time clustering of arbitrary corpus subsets4
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Projections for efficient document clustering
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting clustering and phrases for context-based information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Using a generalized instance set for automatic text categorization
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On the merits of building categorization systems by supervised clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The VLDB Journal — The International Journal on Very Large Data Bases
Sampling search-engine results
WWW '05 Proceedings of the 14th international conference on World Wide Web
Creating MAGIC: system for generating learning object metadata for instructional content
Proceedings of the 13th annual ACM international conference on Multimedia
A local semi-supervised Sammon algorithm for textual data visualization
Journal of Intelligent Information Systems
Collaborative content and user-based web ontology learning system
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
A partially supervised metric multidimensional scaling algorithm for textual data visualization
IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Semi-supervised metrics for textual data visualization
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Knowledge discovery from text learning for ontology modeling
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Ontology-Based similarity between text documents on manifold
ASWC'06 Proceedings of the First Asian conference on The Semantic Web
Clustering and categorization of Brazilian portuguese legal documents
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Hi-index | 0.01 |
Abstract--In this paper, we discuss the merits of building text categorization systems by using supervised clustering techniques. Traditional approaches for document classification on a predefined set of classes are often unable to provide sufficient accuracy because of the difficulty of fitting a manually categorized collection of documents in a given classification model. This is especially the case for heterogeneous collections of Web documents which have varying styles, vocabulary, and authorship. Hence, this paper investigates the use of clustering in order to create the set of categories and its use for classification of documents. Completely unsupervised clustering has the disadvantage that it has difficulty in isolating sufficiently fine-grained classes of documents relating to a coherent subject matter. In this paper, we use the information from a preexisting taxonomy in order to supervise the creation of a set of related clusters, though with some freedom in defining and creating the classes. We show that the advantage of using partially supervised clustering is that it is possible to have some control over the range of subjects that one would like the categorization system to address, but with a precise mathematical definition of how each category is defined. An extremely effective way then to categorize documents is to use this a priori knowledge of the definition of each category. We also discuss a new technique to help the classifier distinguish better among closely related clusters.