Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Aliasing on the world wide web: prevalence and performance implications
Proceedings of the 11th international conference on World Wide Web
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Visual Based Content Understanding towards Web Adaptation
AH '02 Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
WWW '03 Proceedings of the 12th international conference on World Wide Web
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Redundancy elimination within large collections of files
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Design, implementation, and evaluation of duplicate transfer detection in HTTP
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
View invalidation for dynamic content caching in multitiered architectures
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Replica-aware caching for Web proxies
Computer Communications
Editorial: special issue on web content mining
ACM SIGKDD Explorations Newsletter
Bootstrapping Semantic Annotation for Content-Rich HTML Documents
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Browsing fatigue in handhelds: semantic bookmarking spells relief
WWW '05 Proceedings of the 14th international conference on World Wide Web
A fragment-based approach for efficiently creating dynamic web content
ACM Transactions on Internet Technology (TOIT)
Context-aware interactive content adaptation
Proceedings of the 4th international conference on Mobile systems, applications and services
Model-directed web transactions under constrained modalities
Proceedings of the 15th international conference on World Wide Web
Template detection for large scale search engines
Proceedings of the 2006 ACM symposium on Applied computing
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
IEEE Transactions on Mobile Computing
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Homepage live: automatic block tracing for web personalization
Proceedings of the 16th international conference on World Wide Web
Csurf: a context-driven non-visual web-browser
Proceedings of the 16th international conference on World Wide Web
Context browsing with mobiles - when less is more
Proceedings of the 5th international conference on Mobile systems, applications and services
Model-directed Web transactions under constrained modalities
ACM Transactions on the Web (TWEB)
Web Contents Extracting for Web-Based Learning
ICWL '08 Proceedings of the 7th international conference on Advances in Web Based Learning
A Semiautomatic Content Adaptation Authoring Tool for Mobile Learning
ICWL '08 Proceedings of the 7th international conference on Advances in Web Based Learning
Automated Semantic Analysis of Schematic Data
World Wide Web
The web changes everything: understanding the dynamics of web content
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Foundations and Trends in Databases
Bridging the Web Accessibility Divide
Electronic Notes in Theoretical Computer Science (ENTCS)
Web page DOM node characterization and its application to page segmentation
IMSAA'09 Proceedings of the 3rd IEEE international conference on Internet multimedia services architecture and applications
Accelerating dynamic web content delivery using keyword-based fragment detection
Journal of Web Engineering
A TNATS approach to hidden web documents
ICDCIT'04 Proceedings of the First international conference on Distributed Computing and Internet Technology
EXTIRP 2004: towards heterogeneity
INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval
Automated detection of refactorings in evolving components
ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
A shared fragments analysis system for large collections of web pages
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
Dividing web pages into fragments has been shown to provide significant benefits for both content generation and caching. In order for a web site to use fragment-based content generation, however, good methods are needed for dividing web pages into fragments. Manual fragmentation of web pages is expensive, error prone, and unscalable. This paper proposes a novel scheme to automatically detect and flag fragments that are cost-effective cache units in web sites serving dynamic content. We consider the fragments to be interesting if they are shared among multiple documents or they have different lifetime or personalization characteristics. Our approach has three unique features. First, we propose a hierarchical and fragment-aware model of the dynamic web pages and a data structure that is compact and effective for fragment detection. Second, we present an efficient algorithm to detect maximal fragments that are shared among multiple documents. Third, we develop a practical algorithm that effectively detects fragments based on their lifetime and personalization characteristics. We evaluate the proposed scheme through a series of experiments, showing the benefits and costs of the algorithms. We also study the impact of adopting the fragments detected by our system on disk space utilization and network bandwidth consumption.