A Linear Least Squares Fit mapping method for information retrieval from natural language texts

Authors:
Yiming Yang;Christopher G. Chute
Affiliations:
Mayo Clinic/Foundation, Rochester, Minnesota;Mayo Clinic/Foundation, Rochester, Minnesota
Venue:
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Year:
1992

Citing 4
Cited 23

An evaluation of retrieval effectiveness for a full-text document-retrieval system

Communications of the ACM
A statistical approach to machine translation

Computational Linguistics
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)

An application of least squares fit mapping to text information retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Exploiting Hierarchy in Text Categorization

Information Retrieval
Catching the Cheshire Cat

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Feature selection with conditional mutual information maximin in text categorization

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Multi-dimensional text classification

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier

Journal of Biomedical Informatics
Identification of patients with congestive heart failure using a binary classifier: a case study

BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Text categorization: potential tool for managerial decision-making

AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
Text-based decision making with artificial immune systems

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
New methods for text categorization

CIMMACS'06 Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics
Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
Efficient rule based structural algorithms for classification of tree structured data

Intelligent Data Analysis
Sales Intelligence Using Web Mining

ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Predicting risk from financial reports with regression

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections

Journal of Biomedical Informatics
Reflective random indexing for semi-automatic indexing of the biomedical literature

Journal of Biomedical Informatics
Statistical Classification of Scientific Publications

Informatica
Stochastic modelling of scientific terms distribution in publications

MKM'06 Proceedings of the 5th international conference on Mathematical Knowledge Management
Text similarity computing based on standard deviation

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes a unique method for mapping natural language texts to canonical terms that identify the contents of the texts. This method learns empirical associations between free-form texts and canonical terms from human-assigned matches and determines a Linear Least Squares Fit (LLSF) mapping function which represents weighted connections between words in the texts and the canonical terms. The mapping function enables us to project an arbitrary text to the canonical term space where the "transformed" text is compared with the terms, and similarity scores are obtained which quantify the relevance between the the text and the terms. This approach has superior power to discover synonyms or related terms and to preserve the context sensitivity of the mapping. We achieved a rate of 84% in both the recall and the precision with a testing set of 6,913 texts, outperforming other techniques including string matching (15%), morphological parsing (17%) and statistical weighting (21%).