Journal of the American Society for Information Science and Technology
The evolution of US state government home pages from 1997 to 2002
International Journal of Human-Computer Studies - Special issue on HCI and MIS
Do the Web sites of higher rated scholars have significantly more online impact?
Journal of the American Society for Information Science and Technology
Methods for reporting on the targets of links from national systems of university web sites
Information Processing and Management: an International Journal
What type of page is this?: genre as web descriptor
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Information Processing and Management: an International Journal
Genre classification of web documents
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Classifying Web Pages by Genre: An n-Gram Approach
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Cybergenre: automatic identification of home pages on the web
Journal of Web Engineering
Disentangling from babylonian confusion – unsupervised language identification
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
CoLIS'05 Proceedings of the 5th international conference on Context: conceptions of Library and Information Sciences
Towards logical hypertext structure
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
Structured text retrieval by means of affordances and genre
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Hi-index | 0.00 |
We analyse academic Web pages in order to automatically classify them into Web genres. For this purpose, we have developed a database-driven corpus, currently containing 1300000+ documents, which comprises our empirical research basis. We introduce the notions of Web genre type which constitutes the framework for a certain Web genre, and compulsory and optional Web genre modules. These act as building blocks which go together to make up the structure characterised by the Web genre type and operate as modifiers for the default assignment. The analysis of a 200 document sample illustrates our notion of Web genre hierarchy into which Web genre types and modules are embedded. The analysis of four documents of the Web Genre Academic's Personal Homepage demonstrates our approach and our long-term goal of automatically extracting the contents of Web genre modules in order to build up structured XML documents of unstructured HTML documents.