KMS: a distributed hypermedia system for managing knowledge in organizations
Communications of the ACM
Reflections on NoteCards: seven issues for the next generation of hypermedia systems
Communications of the ACM
Identifying aggregates in hypertext structures
HYPERTEXT '91 Proceedings of the third annual ACM conference on Hypertext
The Dexter hypertext reference model
Communications of the ACM
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
ParaSite: mining structural information on the Web
Selected papers from the sixth international conference on World Wide Web
Finding context paths for Web pages
Proceedings of the tenth ACM Conference on Hypertext and hypermedia : returning to our diverse roots: returning to our diverse roots
Searching the Web: the public and their queries
Journal of the American Society for Information Science and Technology
Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents
IEEE Transactions on Knowledge and Data Engineering
Extracting Large-Scale Knowledge Bases from the Web
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Cha-Cha: a system for organizing intranet search results
USITS'99 Proceedings of the 2nd conference on USENIX Symposium on Internet Technologies and Systems - Volume 2
Proceedings of the 13th international conference on World Wide Web
Properties of academic paper references
Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
The site browser: catalyzing improvements in hypertext organization
Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
Distribution of relevant documents in domain-level aggregates for topic distillation
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Finding the boundaries of information resources on the web
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
As we may perceive: inferring logical documents from hypertext
Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
A decision mechanism for the selective combination of evidence in topic distillation
Information Retrieval
Site level noise removal for search engines
Proceedings of the 15th international conference on World Wide Web
As we may perceive: finding the boundaries of compound documents on the web
Proceedings of the 17th international conference on World Wide Web
Computational Intelligence techniques for Web personalization
Web Intelligence and Agent Systems
Automatically assessing resource quality for educational digital libraries
Proceedings of the 3rd workshop on Information credibility on the web
Automatically characterizing resource quality for educational digital libraries
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Combining evidence for relevance criteria: a framework and experiments in web retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Automatically constructing descriptive site maps
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
An overview of web data clustering practices
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Towards logical hypertext structure
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
MenuMiner: revealing the information architecture of large web sites by analyzing maximal cliques
Proceedings of the 21st international conference companion on World Wide Web
A Web-based resource model for scholarship 2.0: object reuse & exchange
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
Most text analysis is designed to deal with the concept of a "document", namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the World Wide Web tend to have a much smaller granularity than text documents. We claim that the notions of "document" and "web node" are not synonymous, and that authors often tend to deploy documents as collections of URLs, which we call "compound documents". In this paper we present new techniques for identifying and working with such compound documents, and the results of some large-scale studies on such web documents. The primary motivation for this work stems from the fact that information retrieval techniques are better suited to working on documents than individual hypertext nodes.