Understanding website complexity: measurements, metrics, and implications

Authors:
Michael Butkiewicz;Harsha V. Madhyastha;Vyas Sekar
Affiliations:
UC Riverside, Riverside, CA, USA;UC Riverside, Riverside, CA, USA;Intel Labs, Berkeley, CA, USA
Venue:
Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Year:
2011

Citing 24
Cited 10

The case for persistent-connection HTTP

SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
httperf—a tool for measuring web server performance

ACM SIGMETRICS Performance Evaluation Review
Quality is in the eye of the beholder: meeting users' requirements for Internet quality of service

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The Ninja architecture for robust Internet-scale systems and services373423

Computer Networks: The International Journal of Computer and Telecommunications Networking - pervasive computing
On the use and performance of content distribution networks

IMW '01 Proceedings of the 1st ACM SIGCOMM Workshop on Internet Measurement
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Website Complexity Metrics for Measuring Navigability

QSIC '04 Proceedings of the Quality Software, Fourth International Conference
The portrait of a common HTML web page

Proceedings of the 2006 ACM symposium on Document engineering
AjaxScope: a platform for remotely monitoring the client-side behavior of web 2.0 applications

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
NetComplex: a complexity metric for networked system designs

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Privacy diffusion on the web: a longitudinal perspective

Proceedings of the 18th international conference on World wide web
Unraveling the complexity of network management

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Understanding online social network usage from a network perspective

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
Network level footprints of facebook applications

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
The web as a graph: measurements, models, and methods

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Fast and parallel webpage layout

Proceedings of the 19th international conference on World wide web
The new web: characterizing AJAX traffic

PAM'08 Proceedings of the 9th international conference on Passive and active network measurement
Toward quantifying system manageability

HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
WebProphet: automating performance prediction for web services

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
AjaxTracker: active measurement system for high-fidelity characterization of AJAX applications

WebApps'10 Proceedings of the 2010 USENIX conference on Web application development
Characterizing Organizational Use of Web-Based Services: Methodology, Challenges, Observations, and Insights

ACM Transactions on the Web (TWEB)
Towards understanding modern web traffic

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Web content cartography

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference

Towards understanding modern web traffic

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Fathom: a browser-based network measurement platform

Proceedings of the 2012 ACM conference on Internet measurement conference
Enabling the transition to the mobile web with WebSieve

Proceedings of the 14th Workshop on Mobile Computing Systems and Applications
Demystifying page load performance with WProf

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
A provider-side view of web search response time

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Co-operative content adaptation framework: satisfying consumer and content creator in resource constrained browsing

Proceedings of the 22nd international conference on World Wide Web companion
Community contribution award -- Measuring and mitigating web performance bottlenecks in broadband access networks

Proceedings of the 2013 conference on Internet measurement conference
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
An analysis of Facebook photo caching

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Diagnosing slow web page access at the client side

Proceedings of the 2013 workshop on Student workhop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the years, the web has evolved from simple text content from one server to a complex ecosystem with different types of content from servers spread across several administrative domains. There is anecdotal evidence of users being frustrated with high page load times or when obscure scripts cause their browser windows to freeze. Because page load times are known to directly impact user satisfaction, providers would like to understand if and how the complexity of their websites affects the user experience. While there is an extensive literature on measuring web graphs, website popularity, and the nature of web traffic, there has been little work in understanding how complex individual websites are, and how this complexity impacts the clients' experience. This paper is a first step to address this gap. To this end, we identify a set of metrics to characterize the complexity of websites both at a content-level (e.g., number and size of images) and service-level (e.g., number of servers/origins). We find that the distributions of these metrics are largely independent of a website's popularity rank. However, some categories (e.g., News) are more complex than others. More than 60% of websites have content from at least 5 non-origin sources and these contribute more than 35% of the bytes downloaded. In addition, we analyze which metrics are most critical for predicting page render and load times and find that the number of objects requested is the most important factor. With respect to variability in load times, however, we find that the number of servers is the best indicator.