Sample size and informetric model goodness-of-fit outcomes: a search engine log case study

Authors:
Isola Ajiferuke;Dietmar Wolfram;Felix Famoye
Affiliations:
Faculty of Information and Media Studies, University of Western Ontario, London, ON, Canada;School of Information Studies, University of Wisconsin-Milwaukee, Milwaukee, WI, USA;Department of Mathematics, Central Michigan University, Mount Pleasant, MI, USA
Venue:
Journal of Information Science
Year:
2006

Citing 15
Cited 1

Goodness-of-fit techniques

Goodness-of-fit techniques
Stochastic models for the distribution of index terms

Journal of Documentation
Anatomy of the generalized inverse Gaussian-Poisson distribution with special applications to bibliometric studies

Information Processing and Management: an International Journal - Special issue on Informetrics
Heavy-tailed probability distributions in the World Wide Web

A practical guide to heavy tails
Analysis of a very large web search engine query log

ACM SIGIR Forum
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
On near-uniform URL sampling

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Searching the Web: the public and their queries

Journal of the American Society for Information Science and Technology
Sampling and concentration values of incomplete bibliographies

Journal of the American Society for Information Science and Technology
Vox populi: the public searching of the Web

Journal of the American Society for Information Science and Technology
The Laws of the Web: Patterns in the Ecology of Information

The Laws of the Web: Patterns in the Ecology of Information
Analysis of large data logs: an application of Poisson sampling on excite web queries

Information Processing and Management: an International Journal
From E-Sex to E-Commerce: Web Search Changes

Computer
Characteristics of WWW Client-based Traces

Characteristics of WWW Client-based Traces

Search characteristics in different types of Web-based IR environments: Are they the same?

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The influence of sample size on informetric characteristics is examined to determine whether theoretical mathematical models can adequately fit large data sets. Two large data sets of queries submitted to the Excite search service were sampled for search characteristics (term frequencies, terms used per query, pages viewed per query, queries submitted per session) producing data sets of various sizes that were fitted to theoretical models to determine how the sample may influence a model's goodness-of-fit. Although theoretical models could adequately fit smaller data sets of up to 5000 observations in some cases, larger data sets could not be satisfactorily fitted using several goodness-of-fit techniques. Investigators must take into account that sample size does influence goodness-of-fit outcomes. The nature of the data and not the limitations of given goodness-of-fit tests results in significant outcomes. Such goodness-of-fit tests should be used for comparative purposes, rather than significance testing.