A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Less is More: Active Learning with Support Vector Machines
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Support Vector Machine Active Learning with Application sto Text Classification
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatically collecting, monitoring, and mining japanese weblogs
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Splog detection using self-similarity analysis on blog temporal dynamics
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Analysing features of Japanese splogs and characteristics of keywords
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Detecting spam blogs from blog search results
Information Processing and Management: an International Journal
Foundations and Trends in Information Retrieval
Detecting splogs using similarities of splog HTML structures
Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Hi-index | 0.00 |
This paper studies how to reduce the amount of human supervision for identifying splogs / authentic blogs in the context of continuously updating splog data sets year by year. Following the previous works on active learning, against the task of splog / authentic blog detection, this paper empirically examines several strategies for selective sampling in active learning by Support Vector Machines (SVMs). As a confidence measure of SVMs learning, we employ the distance from the separating hyperplane to each test instance, which have been well studied in active learning for text classification. Unlike those results of applying active learning to text classification tasks, in the task of splog / authentic blog detection of this paper, it is not the case that adding least confident samples peforms best.