Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
C4.5: programs for machine learning
C4.5: programs for machine learning
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Artificial Intelligence Review - Special issue on lazy learning
Using a generalized instance set for automatic text categorization
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Choose Your Words Carefully: An Empirical Study of Feature Selection Metrics for Text Classification
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Using WordNet to Disambiguate Word Senses for Text Classification
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
Rich document representation and classification: An analysis
Knowledge-Based Systems
Sequential Patterns for Maintaining Ontologies over Time
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Enhancing the Performance of Centroid Classifier by ECOC and Model Refinement
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Semi-supervised Document Clustering with Simultaneous Text Representation and Categorization
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Intelligent phishing detection system for e-banking using fuzzy data mining
Expert Systems with Applications: An International Journal
Extract semantic information from Wordnet to improve text classification performance
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Using information from the target language to improve crosslingual text classification
IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing
The analysis of research fields of Chinese natural science fund based on text classification
International Journal of Intelligent Systems Technologies and Applications
Class-dependent projection based method for text categorization
Pattern Recognition Letters
A fast hybrid classification algorithm based on the minimum distance and the k-NN classifiers
Proceedings of the Fourth International Conference on SImilarity Search and APplications
ACM Transactions on the Web (TWEB)
Class confidence weighted kNN algorithms for imbalanced data sets
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Web Page Summarization for Just-in-Time Contextual Advertising
ACM Transactions on Intelligent Systems and Technology (TIST)
Characterizing the file hosting ecosystem: A view from the edge
Performance Evaluation
Self-learning predictor aggregation for the evolution of people-driven ad-hoc processes
BPM'11 Proceedings of the 9th international conference on Business process management
DC proposal: model for news filtering with named entities
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Topic-based website feature analysis for enterprise search from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Letter based text scoring method for language identification
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Naive bayes for text classification with unbalanced classes
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Combining contents and citations for scientific document classification
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Domain-specific website recognition using hybrid vector space model
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Improving kNN text categorization by removing outliers from training set
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Document classification with multi-layered immune principle
ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part I
Enhanced centroid-based classification technique by filtering outliers
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Class normalization in centroid-based text categorization
Information Sciences: an International Journal
CITOM: incremental construction of topic maps
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Enhancing text classification by information embedded in the test set
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Information Processing and Management: an International Journal
PERC: a personal email classifier
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
CITOM: An incremental construction of multilingual topic maps
Data & Knowledge Engineering
Research on text categorization based on a weakly-supervised transfer learning method
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hybrid random forests: advantages of mixed trees in classifying text data
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Social media-driven news personalization
Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web
MLICC: a multi-label and incremental centroid-based classification of web pages by genre
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Persian text classification based on K-NN using wordnet
IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
A high performance centroid-based classification approach for language identification
Pattern Recognition Letters
Semantics-based event-driven web news classification
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Text segmentation based on document understanding for information retrieval
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Feature selection based on term frequency and T-test for text categorization
Proceedings of the 21st ACM international conference on Information and knowledge management
Live and learn from mistakes: A lightweight system for document classification
Information Processing and Management: an International Journal
Technology classification with latent semantic indexing
Expert Systems with Applications: An International Journal
Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces
International Journal of Data Warehousing and Mining
3D model retrieval using hybrid features and class information
Multimedia Tools and Applications
A document is known by the company it keeps: neighborhood consensus for short text categorization
Language Resources and Evaluation
Class-indexing-based term weighting for automatic text classification
Information Sciences: an International Journal
An effective class-centroid-based dimension reduction method for text classification
Proceedings of the 22nd international conference on World Wide Web companion
Projected-prototype based classifier for text categorization
Knowledge-Based Systems
Detecting machine-morphed malware variants via engine attribution
Journal in Computer Virology
Semantic contextual advertising based on the open directory project
ACM Transactions on the Web (TWEB)
On Knowledge-Enhanced Document Clustering
International Journal of Information Retrieval Research
Hi-index | 0.00 |
In this paper we present a simple linear-time centroid-based document classification algorithm, that despite its simplicity and robust performance, has not been extensively studied and analyzed. Our experiments show that this centroidbased classifier consistently and substantially outperforms other algorithms such as Naive Bayesian, k-nearest-neighbors, and C4.5, on a wide range of datasets. Our analysis shows that the similarity measure used by the centroid-based scheme allows it to classify a new document based on how closely its behavior matches the behavior of the documents belonging to different classes. This matching allows it to dynamically adjust for classes with different densities and accounts for dependencies between the terms in the different classes