Using the shape recovery method to evaluate indexing techniques

  • Authors:
  • Guillermo Oyarce

  • Affiliations:
  • University of North Texas, P.O. Box 276, Denton, TX 76202

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text representation, central to information processing, must bedescriptive and discriminative. Although some of the manytechniques to construct document representations may outperformothers for certain tasks, no one is consistently better thanothers. Representations are still problematic. Evaluationtechniques are needed to penetrate foundational questions aboutterm behavior in representation. A study that applies the shaperecovery analysis method is reported here as an evaluative tool tocompare different indexing schemes. Three weight coefficients areused to rank indexing terms and are compared to the documents' fulltext. Two of the weight coefficients are novel and the third relieson the chi-squared distribution. Multidimensional scaling reducesthe dimensional space of the document surrogates into atwo-dimensional Cartesian space. Ten concentric circles evenlyseparated at 10% intervals of relevant data points starting at thecentroid are used to construct a precisionrecall curve. ANOVA isused for a straightforward computation of the 4 x 11 matrix of testdata to see whether the four treatments yield the same P-R result.A post hoc HSD Tukey multiple comparisons test among pairwisetreatments is also used to discover homogeneous groups. Thefindings show the value of the methodology to study term weightingschemes, and their descriptiveness and discriminative power, aswell as the potential strength of the novel coefficientsintroduced. © 2008 Wiley Periodicals, Inc.