Automatic genre detection of web documents

Authors:
Chul Su Lim;Kong Joo Lee;Gil Chang Kim
Affiliations:
Division of Computer Science, Department of EECS, KAIST, Taejon;School of Computer & Information Technology, KyungIn Women’s College, Incheon;Division of Computer Science, Department of EECS, KAIST, Taejon
Venue:
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Year:
2004

Citing 6
Cited 5

The Importance of Prior Probabilities for Entry Page Search

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text genre classification with genre-revealing and subject-revealing features

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
An Empirical Text Categorizing Computational Model Based on Stylistic Aspects

ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence
Automatic text categorization in terms of genre and author

Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Text genre detection using common word frequencies

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2

Implementing a characterization of genre for automatic genre identification of web pages

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Zero, single, or multi? Genre of web pages through the users' perspective

Information Processing and Management: an International Journal
Classifying factored genres with part-of-speech histograms

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Stylometric features for emotion level classification in news related blogs

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Automatic genre identification: towards a flexible classification scheme

FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access

Quantified Score

Hi-index	0.00

Visualization

Abstract

A genre or a style is another view of documents different from a subject or a topic. The genre is also a criterion to classify the documents. There have been several studies on detecting a genre of textual documents. However, only a few of them dealt with web documents. In this paper we suggest sets of features to detect genres of web documents. Web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances.