Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Mathematical study of h-index sequences
Information Processing and Management: an International Journal
Hi-index | 0.00 |
The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula ${\rm{h = T}}^{{\textstyle{1 \over \alpha }}} $, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and α is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: ${\rm{h = D10}}^{ - {\textstyle{{\rm{N}} \over \alpha }}} $ where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on ${\rm{h = T}}^{{\textstyle{1 \over \alpha }}} $ gives an estimation of α, which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): ${\rm{h = S}}^{{\textstyle{1 \over \alpha }}} $, where S is the total number of documents in the database. © 2008 Wiley Periodicals, Inc.