An evaluation of retrieval effectiveness for a full-text document-retrieval system
Communications of the ACM
A statistical approach to machine translation
Computational Linguistics
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations
Matrix computations (3rd ed.)
An application of least squares fit mapping to text information retrieval
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Exploiting Hierarchy in Text Categorization
Information Retrieval
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Feature selection with conditional mutual information maximin in text categorization
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Multi-dimensional text classification
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier
Journal of Biomedical Informatics
Identification of patients with congestive heart failure using a binary classifier: a case study
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Text categorization: potential tool for managerial decision-making
AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
Text-based decision making with artificial immune systems
SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
New methods for text categorization
CIMMACS'06 Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics
Methodological Review: Empirical distributional semantics: Methods and biomedical applications
Journal of Biomedical Informatics
Efficient rule based structural algorithms for classification of tree structured data
Intelligent Data Analysis
Sales Intelligence Using Web Mining
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Predicting risk from financial reports with regression
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Journal of Biomedical Informatics
Reflective random indexing for semi-automatic indexing of the biomedical literature
Journal of Biomedical Informatics
Stochastic modelling of scientific terms distribution in publications
MKM'06 Proceedings of the 5th international conference on Mathematical Knowledge Management
Text similarity computing based on standard deviation
ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Hi-index | 0.01 |
This paper describes a unique method for mapping natural language texts to canonical terms that identify the contents of the texts. This method learns empirical associations between free-form texts and canonical terms from human-assigned matches and determines a Linear Least Squares Fit (LLSF) mapping function which represents weighted connections between words in the texts and the canonical terms. The mapping function enables us to project an arbitrary text to the canonical term space where the "transformed" text is compared with the terms, and similarity scores are obtained which quantify the relevance between the the text and the terms. This approach has superior power to discover synonyms or related terms and to preserve the context sensitivity of the mapping. We achieved a rate of 84% in both the recall and the precision with a testing set of 6,913 texts, outperforming other techniques including string matching (15%), morphological parsing (17%) and statistical weighting (21%).