Automatic classification of documents in cold-start scenarios

Authors:
Ricardo Kawase;Marco Fisichella;Bernardo Pereira Nunes;Kyung-Hun Ha;Markus Bick
Affiliations:
Leibniz University of Hanover, Hannover, Germany;Leibniz University of Hanover, Hannover, Germany;Leibniz University of Hanover, Hannover, Germany;ESCP Europe, Berlin, Germany;ESCP Europe, Berlin, Germany
Venue:
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Year:
2013

Citing 16
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Latent dirichlet allocation

The Journal of Machine Learning Research
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Hierarchical Dirichlet model for document classification

ICML '05 Proceedings of the 22nd international conference on Machine learning
Employing Latent Dirichlet Allocation for fraud detection in telecommunications

Pattern Recognition Letters
Raising the baseline for high-precision text classifiers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic model methods for automatically identifying out-of-scope resources

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Beyond TFIDF weighting for text categorization in the vector space model

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Component-based LDA face description for image retrieval and MPEG-7 standardisation

Image and Vision Computing
Pairwise interaction tensor factorization for personalized tag recommendation

Proceedings of the third ACM international conference on Web search and data mining
LDA for on-the-fly auto tagging

Proceedings of the fourth ACM conference on Recommender systems
Unsupervised public health event detection for epidemic intelligence

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Skill-based scouting of open management content

EC-TEL'10 Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice
Unsupervised auto-tagging for learning object enrichment

EC-TEL'11 Proceedings of the 6th European conference on Technology enhanced learning: towards ubiquitous learning
Expectation-propagation for the generative aspect model

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document classification is key to ensuring quality of any digital library. However, classifying documents is a very time-consuming task. In addition, few or none of the documents in a newly created repository are classified. The non-classification of documents not only prevents users from finding information but also hinders the system's aptitude to recommend relevant items. Moreover, the lack of classified documents prevents any kind of machine learning algorithm to automatically annotate these items. In this work, we propose a novel approach to automatically classifying documents that differs from previous works in the sense that it exploits the wisdom of the crowds available on the Web. Our proposed strategy adapts an automatic tagging approach combined with a straightforward matching algorithm to classify documents in a given domain classification. To validate our findings, we compared our methods against the existing and performed a user evaluation with 61 participants to estimate the quality of the classifications. Results show that, in 72% of the cases, the automatic classification is relevant and well accepted by participants. In conclusion, automatic classification can facilitate access to relevant documents.