Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Learning in the presence of concept drift and hidden contexts
Machine Learning
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A patent search and classification system
Proceedings of the fourth ACM conference on Digital libraries
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Neural Networks for Web Content Filtering
IEEE Intelligent Systems
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic Web Rating: Filtering Obscene Content on the Web
ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
Automatic Web Page Classification in a Dynamic and Hierarchical Way
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
A Neural Network Based Approach to Automated E-Mail Classification
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
The disputed federalist papers: SVM feature selection via concave minimization
Proceedings of the 2003 conference on Diversity in computing
"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Aggregated cross-media news visualization and personalization
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Building a dynamic classifier for large text data collections
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Context-aware collaborative data stream mining in ubiquitous devices
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
PersoNews: a personalized news reader enhanced by machine learning and semantic filtering
ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
On feature extraction for spam e-mail detection
MRCS'06 Proceedings of the 2006 international conference on Multimedia Content Representation, Classification and Security
Spam e-mail classification based on the IFWB algorithm
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Sentiment analysis on evolving social streams: how self-report imbalances can help
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.01 |
In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider different subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algorithms.