Knowledge-based metadata extraction from PostScript files
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Integrating automatic genre analysis into digital libraries
Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Machine Learning
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Fine-Grained Document Genre Classification Using First Order Random Graphs
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Investigating GIS and smoothing for maximum entropy taggers
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Clustering document images using a bag of symbols representation
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Journal of the American Society for Information Science and Technology
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Genre classification in automated ingest and appraisal metadata
ECDL'06 Proceedings of the 10th European conference on Research and Advanced Technology for Digital Libraries
PERC: a personal email classifier
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
This paper examines genre classification of documents and its role in enabling the effective automated management of digital documents by digital libraries and other repositories. We have previously presented genre classification as a valuable step toward achieving automated extraction of descriptive metadata for digital material. Here, we present results from experiments using human labellers, conducted to assist in genre characterisation and the prediction of obstacles which need to be overcome by an automated system, and to contribute to the process of creating a solid testbed corpus for extending automated genre classification and testing metadata extraction tools across genres. We also describe the performance of two classifiers based on image and stylistic modeling features in labelling the data resulting from the agreement of three human labellers across fifteen genre classes.