Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The Effects of Linking on Genres of Web Documents
HICSS '99 Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 2 - Volume 2
Reproduced and emergent genres of communication on the World-Wide Web
HICSS '97 Proceedings of the 30th Hawaii International Conference on System Sciences: Digital Documents - Volume 6
A simple rule-based part of speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Text genre detection using common word frequencies
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Implementing a characterization of genre for automatic genre identification of web pages
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Application of kalman filters to identify unexpected change in blogs
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Web Communities Defined by Web Page Content
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Is Web Genre Identification Feasible?
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Learning to recognize webpage genres
Information Processing and Management: an International Journal
Web Site Description Based on Genres and Web Design Patterns
SOCINFO '09 Proceedings of the 2009 International Workshop on Social Informatics
Classifying Web Pages by Genre: An n-Gram Approach
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Effectiveness of web search results for genre and sentiment classification
Journal of Information Science
DocuBrowse: faceted searching, browsing, and recommendations in an enterprise context
Proceedings of the 15th international conference on Intelligent user interfaces
Automatic classification of web search results: product review vs. non-review documents
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Automatic genre classification by using co-training
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Enhance web pages genre identification using neighboring pages
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Structured text retrieval by means of affordances and genre
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Testing a genre-enabled application: a preliminary assessment
FDIA'08 Proceedings of the 2nd BCS IRSG conference on Future Directions in Information Access
A quantitative evaluation of techniques for detection of abnormal change events in blogs.
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Hi-index | 0.00 |
The World Wide Web is a massive corpus that constantly evolves. Classification experiments usually grab a snapshot (temporally and spatially) of the Web for a corpus. In this paper, we examine the effects of page evolution on genre classification of Web pages. Web genre refers to the type of the page characterized by features such as style, form or presentation layout, and meta-content; Web genre can be used to tune spider crawling re-visits and inform relevance judgments for search engines. We found that pages in some genres change rarely if at all and can be used in present-day research experiments without requiring an updated version. We show that an old corpus can be used for training when testing on new Web pages, with only a marginal drop in accuracy rates on genre classification. We also show that features found to be useful in one corpus do not transfer well to other corpora with different genres.