Making large-scale support vector machine learning practical
Advances in kernel methods
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Reading level assessment using support vector machines and statistical language models
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Toward a model of children's information seeking behavior in using digital libraries
Proceedings of the second international symposium on Information interaction in context
Developing a visual taxonomy: Children's views on aesthetics
Journal of the American Society for Information Science and Technology
Children's roles using keyword search interfaces at home
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A picture is worth a thousand search results: finding child-oriented multimedia results with collAge
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Interaction-based information filtering for children
Proceedings of the third symposium on Information interaction in context
A combined topical/non-topical approach to identifying web sites for children
Proceedings of the fourth ACM international conference on Web search and data mining
Multi-step classification approaches to cumulative citation recommendation
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Hi-index | 0.00 |
Identifying child-appropriate web content is an important yet difficult classification task. This novel task is characterised by attempting to determine age/child appropriateness (which is not necessarily topic-based), despite the presence of unbalanced class sizes and the lack of quality training data with human judgements of appropriateness. Classification of feeds, a subset of web content, presents further challenges due to their temporal nature and short document format. In this paper, we discuss these challenges and present baseline results for this task through an empirical study that classifies incoming news stories as appropriate (or not) for children. We show that while the naïve Bayes approach produces a higher AUC it is vulnerable to the imbalanced data problem, and that support vector machine provides a more robust overall solution. Our research shows that classifying children's content is a non-trivial task that has greater complexities than standard text based classification. While the F-score values are consistent with other research examining age-appropriate text classification, we introduce a new problem with a new dataset.