The construction of an empirically based mathematically derived classification system

  • Authors:
  • Harold Borko

  • Affiliations:
  • System Development Corporation, Santa Monica, California

  • Venue:
  • AIEE-IRE '62 (Spring) Proceedings of the May 1-3, 1962, spring joint computer conference
  • Year:
  • 1962

Quantified Score

Hi-index 0.00

Visualization

Abstract

This study describes a method for developing an empirically based, computer derived classification system. 618 psychological abstracts were coded in machine language for computer processing. The total text consisted of approximately 50,000 words of which nearly 6,800 were unique words. The computer program arranged these words in order of frequency of occurrence. From the list of words which occurred 20 or more times, excluding syntactical terms, such as, and, but, of, etc., the investigator selected 90 words for use as index terms. These were arranged in a data matrix with the terms on the horizontal and the document number on the vertical axis. The cells contained the number of times the term was used in the document. Based on these data, a correlation matrix, 90x90 in size, was computed which showed the relationship of each term to every other term. The matrix was factor analyzed and the first 10 eigenvectors were selected as factors. These were rotated for meaning and interpreted as major categories in a classification system. These factors were compared with, and shown to be compatible but not identical to, the classification system used by the American Psychological Association. The results demonstrate the feasibility of an empirically derived classification system and establish the value of factor analysis as a technique in language data processing.