Text characteristics of English language university Web sites: Research Articles
Journal of the American Society for Information Science and Technology
On the Impact of Lexical and Linguistic Features in Genre- and Domain-Based Categorization
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
The massive amount of textual data on the Web raises numerous classification problems. Although the notion of domain is widely acknowledged in the IR field, the applicative concept of genre could solve its weaknesses by taking into account the linguistic properties and the document structures of the texts. Two clustering methods are proposed here to illustrate the complementarity of the notions to characterize a closed scientific article corpus. The results are planned to be used in a Web-based application.