Machine Learning
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Finding high-quality content in social media
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
One-Class Classification by Combining Density and Class Probability Estimation
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
User generated content: how good is it?
Proceedings of the 3rd workshop on Information credibility on the web
ACM Computing Surveys (CSUR)
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Wikibugs: using template messages in open content collections
Proceedings of the 5th International Symposium on Wikis and Open Collaboration
Identifying featured articles in wikipedia: writing style matters
Proceedings of the 19th international conference on World wide web
Language Resources and Evaluation
Towards automatic quality assurance in Wikipedia
Proceedings of the 20th international conference companion on World wide web
A breakdown of quality flaws in Wikipedia
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Predicting quality flaws in user-generated content: the case of wikipedia
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Open-Set classification for automated genre identification
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
For Web applications that are based on user generated content the detection of text quality flaws is a key concern. Our research contributes to automatic quality flaw detection. In particular, we propose to cast the detection of text quality flaws as a one-class classification problem: we are given only positive examples (= texts containing a particular quality flaw) and decide whether or not an unseen text suffers from this flaw. We argue that common binary or multiclass classification approaches are ineffective in here, and we underpin our approach by a real-world application: we employ a dedicated one-class learning approach to determine whether a given Wikipedia article suffers from certain quality flaws. Since in the Wikipedia setting the acquisition of sensible test data is quite intricate, we analyze the effects of a biased sample selection. In addition, we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. Altogether, provided test data with little noise, four from ten important quality flaws in Wikipedia can be detected with a precision close to 1.