Is 1 noun worth 2 adjectives?: measuring relative feature utility

Authors:
Robert M. Losee
Affiliations:
University of North Carolina--Chapel Hill, Chapel Hill, NC
Venue:
Information Processing and Management: an International Journal
Year:
2006

Citing 33
Cited 3

The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrieval

Journal of the American Society for Information Science
Word association norms, mutual information, and lexicography

Computational Linguistics
Experiments with a component theory of probabilistic information retrieval based on single terms as document components

ACM Transactions on Information Systems (TOIS)
Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Upper bounds for retrieval performance and their use measuring performance and generating optimal Boolean queries: can it get any better than this?

Information Processing and Management: an International Journal
Term dependence: truncating the Bahadur Lazarsfeld expansion

Information Processing and Management: an International Journal
Looking in text windows: their size and composition

Information Processing and Management: an International Journal
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Principled disambiguation: discriminating adjective senses with modified nouns

Computational Linguistics
What is the tree that we see through the window: a linguistic approach to windowing and term variation

Information Processing and Management: an International Journal - Special issue on history of information science
Method for evaluation of stemming algorithms based on error counting

Journal of the American Society for Information Science
Stemming algorithms: a case study for detailed evaluation

Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Text windows and phrases differing by discipline, location in document, and syntactic structure

Information Processing and Management: an International Journal
Performance standards and evaluations in IR test collections: cluster-based retrieval models

Information Processing and Management: an International Journal
Performance standards and evaluations in IR test collections: vector-space and other retrieval models

Information Processing and Management: an International Journal
Text retrieval and filtering: analytic models of performance

Text retrieval and filtering: analytic models of performance
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Measuring search-engine quality and query difficulty: ranking with Target and Freestyle

Journal of the American Society for Information Science
Computer Evaluation of Indexing and Text Processing

Journal of the ACM (JACM)
When information retrieval measures agree about the relative quality of document rankings

Journal of the American Society for Information Science
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Information Retrieval

Information Retrieval
Modern Information Retrieval

Modern Information Retrieval
Comparison of three machine-learning methods for Thai part-of-speech tagging

ACM Transactions on Asian Language Information Processing (TALIP)
Natural language processing in support of decision-making: phrases and part-of-speech tagging

Information Processing and Management: an International Journal
Efficient stemmer generation

Information Processing and Management: an International Journal
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Nodes of topicality: modeling user notions of on topic documents

Journal of the American Society for Information Science and Technology
Building a large thesaurus for information retrieval

ANLC '88 Proceedings of the second conference on Applied natural language processing
The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation

Natural Language Engineering
On the evaluation and comparison of taggers: the effect of noise in testing corpora

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Information retrieval system evaluation: effort, sensitivity, and reliability

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Decisions in thesaurus construction and use

Information Processing and Management: an International Journal
Percent perfect performance (PPP)

Information Processing and Management: an International Journal
A dictionary- and corpus-independent statistical lemmatizer for information retrieval in low resource languages

CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

Are two adjectives worth the same as a single noun when documents are ordered based on decreasing topicality? We propose an easy to interpret single number Relative Feature Utility (RFU) measure of the relative worth of using specific linguistic or non-linguistic features or sets of features in computational systems that order or filter media, such as information retrieval and classification systems. This measure allows one to make easily interpreted claims about the relative utility of features such as parts-of-speech, term suffixes, phrases vs. single terms, annotations, hyperlinks, citations, index terms, and metadata when ordering natural language text or other media. Data is provided for the RFU for stemming characteristics, part-of-speech tags, and phrase lengths, as well as retrieval characteristics and procedures. Using this linear measure of the relative utility of features makes available a wide range of cost-benefit analyses and decision theoretic techniques, allowing the study of whether or not to use many different kinds of representational information or tagging systems, and for the design of indexing and metadata systems. Some characteristics of natural languages used in the spectrum from softer to harder sciences, as well as medical terminology, are studied.