Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Modern Information Retrieval
Obtaining Language Models of Web Collections Using Query-Based Sampling Techniques
HICSS '02 Proceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS'02)-Volume 3 - Volume 3
A Description of the LAMB Web-Derived Language Model Builder
A Description of the LAMB Web-Derived Language Model Builder
Comparing the performance of collection selection algorithms
ACM Transactions on Information Systems (TOIS)
Finding authoritative people from the web
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A new statistical approach to DNS traffic anomaly detection
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Hi-index | 0.00 |
We have created three testbeds of web data for use in controlled experiments in collection modeling. This short paper examines the applicability of Ziff's and Heaps' laws as applied to web data. We find extremely close agreement between observed vocabulary growth and Heaps' law. We find reasonable agreement with Ziff's law for medium to low frequency terms. Ziff's law is a poor predictor for high frequency terms. These findings hold for all three testbeds although we restrict ourselves to one here due to space limitations.