Building a document genre corpus: a profile of the KRYS I corpus

Authors:
V. F. Berninger;Yunhyong Kim;Seamus Ross
Affiliations:
Digital Curation Centre & Humanities Advanced Technology and Information Institute, University of Glasgow, Glasgow, UK;Digital Curation Centre & Humanities Advanced Technology and Information Institute, University of Glasgow, Glasgow, UK;Digital Curation Centre & Humanities Advanced Technology and Information Institute, University of Glasgow, Glasgow, UK
Venue:
IRSG'08 Proceedings of the 2008 BCS-IRSG conference on Corpus Profiling
Year:
2008

Citing 5
Cited 0

Knowledge-based metadata extraction from PostScript files

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
The form is the substance: classification of genres in text

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
Examining Variations of Prominent Features in Genre Classification

HICSS '08 Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences
PERC: a personal email classifier

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the KRYS I corpus, consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains.