Structural analysis of hypertexts: identifying hierarchies and useful metrics
ACM Transactions on Information Systems (TOIS)
The Dexter hypertext reference model
Communications of the ACM
Hypermedia and cognition: designing for comprehension
Communications of the ACM
Automatic hypertext link typing
Proceedings of the the seventh ACM conference on Hypertext
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Focus+context views of World-Wide Web nodes
HYPERTEXT '97 Proceedings of the eighth ACM conference on Hypertext
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ParaSite: mining structural information on the Web
Selected papers from the sixth international conference on World Wide Web
Finding context paths for Web pages
Proceedings of the tenth ACM Conference on Hypertext and hypermedia : returning to our diverse roots: returning to our diverse roots
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Generating presentation constraints from rhetorical structure
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Defining logical domains in a web site
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Proceedings of the 10th international conference on World Wide Web
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Information Retrieval and HyperText
Information Retrieval and HyperText
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
An Overview of the GXL Graph Exchange Language
Revised Lectures on Software Visualization, International Seminar
New Techniques for the Discovery of Logical Documents in Web
DANTE '99 Proceedings of the 1999 International Symposium on Database Applications in Non-Traditional Environments
Towards Automatic Web Genre Identification
HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 4 - Volume 4
Computational Linguistics
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Untangling compound documents on the web
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
INFORMATION-THEORETIC CONCEPTS FOR THE ANALYSIS OF COMPLEX NETWORKS
Applied Artificial Intelligence
A history of graph entropy measures
Information Sciences: an International Journal
A solution to the exact match on rare item searches: introducing the lost sheep algorithm
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Hi-index | 0.00 |
Facing the retrieval problem according to the overwhelming set of documents online the adaptation of text categorization to web units has recently been pushed. The aim is to utilize categories of web sites and pages as an additional retrieval criterion. In this context, the bag-of-words model has been utilized just as HTML tags and link structures. In spite of promising results this adaptation stays in the framework of IR specific models since it neglects the content-based structuring inherent to hypertext units. This paper approaches hypertext modelling from the perspective of graph-theory. It presents an XML-based format for representing websites as hypergraphs. These hypergraphs are used to shed light on the relation of hypertext structure types and their web-based instances. We place emphasis on two characteristics of this relation: In terms of realizational ambiguity we speak of functional equivalents to the manifestation of the same structure type. In terms of polymorphism we speak of a single web unit which manifests different structure types. It is shown that polymorphism is a prevalent characteristic of web-based units. This is done by means of a categorization experiment which analyses a corpus of hypergraphs representing the structure and content of pages of conference websites. On this background we plead for a revision of text representation models by means of hypergraphs which are sensitive to the manifold structuring of web documents.