Short communication: Logical operators of some statistical computing packages and missing values

  • Authors:
  • José Antonio Cordeiro

  • Affiliations:
  • Department of Epidemiology and Public Heath, School of Medicine de São José do Rio Preto, São Paulo, Brazil

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2007

Quantified Score

Hi-index 0.03

Visualization

Abstract

When using statistical computer packages in general, we rely on the results they produce. We are aware that numerical approximations are made and trust that the best algorithms are chosen to do them. Most manuals give us instructions about precision of calculations and some report how missing values are administered. What we are unaware of is that some packages can invent results when creating atomic formulas and compounding complex formulas out of atomic ones, what inflates sample sizes, and possibly leads us to incorrect statistical decisions. Two simple indicator variables, with missing values positioned so the results are always missing values, were tested as numerical, as logical and as character variables, by compounding them through connective 'and' (&) and 'or'(|) to form new indicator variables. The results show that one of the three very known packages does not, statistically, correctly handle missing values, and the three make atomic formulas out of character variables assigning the value false (0) for missing value, what can be said an statistical error. The conclusion is that statisticians and users of statistics must be aware of the capabilities of logically operating missing values of the statistical packages they use, otherwise wrong statistical decisions can be made. And that programmers of statistical packages should correct their algorithms in order to not permit their packages invent non-existing values.