Detection of news feeds items appropriate for children

Authors:
Tamara Polajnar;Richard Glassey;Leif Azzopardi
Affiliations:
School of Computing Science, University of Glasgow, Glasgow, UK;School of Computing Science, University of Glasgow, Glasgow, UK;School of Computing Science, University of Glasgow, Glasgow, UK
Venue:
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Year:
2012

Citing 10
Cited 1

Making large-scale support vector machine learning practical

Advances in kernel methods
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Reading level assessment using support vector machines and statistical language models

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Toward a model of children's information seeking behavior in using digital libraries

Proceedings of the second international symposium on Information interaction in context
Developing a visual taxonomy: Children's views on aesthetics

Journal of the American Society for Information Science and Technology
Children's roles using keyword search interfaces at home

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A picture is worth a thousand search results: finding child-oriented multimedia results with collAge

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Interaction-based information filtering for children

Proceedings of the third symposium on Information interaction in context
A combined topical/non-topical approach to identifying web sites for children

Proceedings of the fourth ACM international conference on Web search and data mining

Multi-step classification approaches to cumulative citation recommendation

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying child-appropriate web content is an important yet difficult classification task. This novel task is characterised by attempting to determine age/child appropriateness (which is not necessarily topic-based), despite the presence of unbalanced class sizes and the lack of quality training data with human judgements of appropriateness. Classification of feeds, a subset of web content, presents further challenges due to their temporal nature and short document format. In this paper, we discuss these challenges and present baseline results for this task through an empirical study that classifies incoming news stories as appropriate (or not) for children. We show that while the naïve Bayes approach produces a higher AUC it is vulnerable to the imbalanced data problem, and that support vector machine provides a more robust overall solution. Our research shows that classifying children's content is a non-trivial task that has greater complexities than standard text based classification. While the F-score values are consistent with other research examining age-appropriate text classification, we introduce a new problem with a new dataset.