Is 1 noun worth 2 adjectives?: measuring relative feature utility

  • Authors:
  • Robert M. Losee

  • Affiliations:
  • University of North Carolina--Chapel Hill, Chapel Hill, NC

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Are two adjectives worth the same as a single noun when documents are ordered based on decreasing topicality? We propose an easy to interpret single number Relative Feature Utility (RFU) measure of the relative worth of using specific linguistic or non-linguistic features or sets of features in computational systems that order or filter media, such as information retrieval and classification systems. This measure allows one to make easily interpreted claims about the relative utility of features such as parts-of-speech, term suffixes, phrases vs. single terms, annotations, hyperlinks, citations, index terms, and metadata when ordering natural language text or other media. Data is provided for the RFU for stemming characteristics, part-of-speech tags, and phrase lengths, as well as retrieval characteristics and procedures. Using this linear measure of the relative utility of features makes available a wide range of cost-benefit analyses and decision theoretic techniques, allowing the study of whether or not to use many different kinds of representational information or tagging systems, and for the design of indexing and metadata systems. Some characteristics of natural languages used in the spectrum from softer to harder sciences, as well as medical terminology, are studied.