The Importance of Prior Probabilities for Entry Page Search
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text genre classification with genre-revealing and subject-revealing features
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
An Empirical Text Categorizing Computational Model Based on Stylistic Aspects
ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence
Automatic text categorization in terms of genre and author
Computational Linguistics
Recognizing text genres with simple metrics using discriminant analysis
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Text genre detection using common word frequencies
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Implementing a characterization of genre for automatic genre identification of web pages
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Zero, single, or multi? Genre of web pages through the users' perspective
Information Processing and Management: an International Journal
Classifying factored genres with part-of-speech histograms
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Stylometric features for emotion level classification in news related blogs
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Automatic genre identification: towards a flexible classification scheme
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Hi-index | 0.00 |
A genre or a style is another view of documents different from a subject or a topic. The genre is also a criterion to classify the documents. There have been several studies on detecting a genre of textual documents. However, only a few of them dealt with web documents. In this paper we suggest sets of features to detect genres of web documents. Web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances.