Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
In this work, we present a small-scale evaluation of two query-based sampling techniques for building language models, using a database comprised of world-wide web documents. We propose a metric by which it is possible to determine when to cease sampling a given web database, and we compare this new metric to other metrics that have been used in previous work to determine the fidelity of sampled language models.