A caching relay for the World Wide Web
Selected papers of the first conference on World-Wide Web
Characterizing reference locality in the WWW
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
On B-Tree Indices for Skewed Distributions
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Characteristics of WWW Client-based Traces
Characteristics of WWW Client-based Traces
A Classification Approach for Prediction of Target Events in Temporal Sequences
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Recovering latent time-series from their observed sums: network tomography with particle filters.
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs over time: densification laws, shrinking diameters and possible explanations
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
The TEXTURE benchmark: measuring performance of text queries on a relational DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Modeling hypermedia-based communication
Information Sciences: an International Journal
Graph mining: Laws, generators, and algorithms
ACM Computing Surveys (CSUR)
Estimating required recall for successful knowledge acquisition from the web
Proceedings of the 15th international conference on World Wide Web
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Behavior-based modeling and its application to Email analysis
ACM Transactions on Internet Technology (TOIT)
Indexing schemes for similarity search: an illustrated paradigm
Fundamenta Informaticae
Graph evolution: Densification and shrinking diameters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Comprehensive data warehouse exploration with qualified association-rule mining
Decision Support Systems
Visualization of large networks with min-cut plots, A-plots and R-MAT
International Journal of Human-Computer Studies
EDUA: An efficient algorithm for dynamic database mining
Information Sciences: an International Journal
Indexing schemes for similarity search in datasets of short protein fragments
Information Systems
The VLDB Journal — The International Journal on Very Large Data Bases
Mobile call graphs: beyond power-law and lognormal distributions
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Large human communication networks: patterns and a utility-driven generator
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A comprehensive survey of numeric and symbolic outlier mining techniques
Intelligent Data Analysis
Modeling hypermedia-based communication
Information Sciences: an International Journal
Kronecker Graphs: An Approach to Modeling Networks
The Journal of Machine Learning Research
Fuzzy web surfer models: theory and experiments
WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
Analysis of large multi-modal social networks: patterns and a generator
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Generative models for rapid information propagation
Proceedings of the First Workshop on Social Media Analytics
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
On exploring the power-law relationship in the itemset support distribution
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Constructing and sampling graphs with a prescribed joint degree distribution
Journal of Experimental Algorithmics (JEA)
Quantifying reciprocity in large weighted communication networks
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Indexing Schemes for Similarity Search: an Illustrated Paradigm
Fundamenta Informaticae
Fairness on the web: alternatives to the power law
Proceedings of the 3rd Annual ACM Web Science Conference
Evolution of social-attribute networks: measurements, modeling, and implications using google+
Proceedings of the 2012 ACM conference on Internet measurement conference
Reachability analysis and modeling of dynamic event networks
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
An in-depth analysis of stochastic Kronecker graphs
Journal of the ACM (JACM)
Hi-index | 0.00 |
Skewed distributions appear very often in practice. Unfortunately, the traditional Zipf distribution often fails to model them well. In this paper, we propose a new probability distribution, the Discrete Gaussian Exponential (DGX), to achieve excellent fits in a wide variety of settings; our new distribution includes the Zipf distribution as a special case. We present a statistically sound method for estimating the DGX parameters based on maximum likelihood estimation (MLE). We applied DGX to a wide variety of real world data sets, such as sales data from a large retailer chain, us-age data from AT&T, and Internet clickstream data; in all cases, DGX fits these distributions very well, with almost a 99% correlation coefficient in quantile-quantile plots. Our algorithm also scales very well because it requires only a single pass over the data. Finally, we illustrate the power of DGX as a new tool for data mining tasks, such as outlier detection.