The statistical analysis of compositional data
The statistical analysis of compositional data
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
An EM Algorithm for the Block Mixture Model
IEEE Transactions on Pattern Analysis and Machine Intelligence
A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A probabilistic framework for relational clustering
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Block clustering with Bernoulli mixture models: Comparison of different approaches
Computational Statistics & Data Analysis
Parsimonious Gaussian mixture models
Statistics and Computing
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Get out the vote: determining support or opposition from congressional floor-debate transcripts
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
The Journal of Machine Learning Research
Using Crowdsourcing and Active Learning to Track Sentiment in Online Media
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Eliminating spammers and ranking annotators for crowdsourced labeling tasks
The Journal of Machine Learning Research
Hi-index | 0.00 |
Modelling bias is an important consideration when dealing with inexpert annotations. We are concerned with training a classifier to perform sentiment analysis on news media articles, some of which have been manually annotated by volunteers. The classifier is trained on the words in the articles and then applied to non-annotated articles. In previous work we found that a joint estimation of the annotator biases and the classifier parameters performed better than estimation of the biases followed by training of the classifier. An important question follows from this result: can the annotators be usefully clustered into either predetermined or data-driven clusters, based on their biases? If so, such a clustering could be used to select, drop or otherwise categorise the annotators in a crowdsourcing task. This paper presents work on fitting a finite mixture model to the annotators' bias. We develop a model and an algorithm and demonstrate its properties on simulated data. We then demonstrate the clustering that exists in our motivating dataset, namely the analysis of potentially economically relevant news articles from Irish online news sources.