An example-based mapping method for text categorization and retrieval

Authors:
Yiming Yang;Christopher G. Chute
Affiliations:
Mayo Clinic;Mayo Clinic
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
1994

Citing 6
Cited 113

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Evaluating text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
An application of least squares fit mapping to text information retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
A Linear Least Squares Fit mapping method for information retrieval from natural language texts

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2

Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A multilevel approach to intelligent information filtering: model, system, and evaluation

ACM Transactions on Information Systems (TOIS)
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Scalable association-based text classification

Proceedings of the ninth international conference on Information and knowledge management
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Concept-based knowledge discovery in texts extracted from the Web

ACM SIGKDD Explorations Newsletter
Text categorization for multi-page documents: a hybrid naive Bayes HMM approach

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A meta-learning approach for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Using LSI for text classification in the presence of background text

Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Classifying text documents by associating terms with text categories

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Topic difference factor extraction between two document sets and its application to text categorization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Personalized web search by mapping user queries to categories

Proceedings of the eleventh international conference on Information and knowledge management
Exploiting Hierarchy in Text Categorization

Information Retrieval
Text Categorization Based on Regularized Linear Classification Methods

Information Retrieval
Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization

Information Retrieval
Hidden Markov Models for Text Categorization in Multi-Page Documents

Journal of Intelligent Information Systems
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
Text classification using ESC-based stochastic decision lists

Information Processing and Management: an International Journal
Converting numerical classification into text classification

Artificial Intelligence
Evaluation and Construction of Training Corpuses for Text Classification: A Preliminary Study

NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
An Approach to Improve Text Classification Efficiency

ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
A Linear Text Classification Algorithm Based on Category Relevance Factors

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Predictive Self-Organizing Networks for Text Categorization

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Meta-learning Models for Automatic Textual Document Categorization

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Empirical Study of Recommender Systems Using Linear Classifiers

PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Effective Methods for Improving Naive Bayes Text Classifiers

PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Title Generation Using a Training Corpus

CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
Interact: A Staged Approach to Customer Service Automation

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Feature Selection Using Association Word Mining for Classification

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Incremental context mining for adaptive document classification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robustness of regularized linear classification methods in text categorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Recommender systems using linear classifiers

The Journal of Machine Learning Research
Mining for interactive identification of users' information needs

Information Systems
CBC: Clustering Based Text Classification Requiring Minimal Labeled Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Personalized Web Search For Improving Retrieval Effectiveness

IEEE Transactions on Knowledge and Data Engineering
Effect of term distributions on centroid-based text categorization

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Document classification using a finite mixture model

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Event detection from online news documents for supporting environmental scanning

Decision Support Systems - Special issue: Knowledge management technique
An adaptive k-nearest neighbor text categorization strategy

ACM Transactions on Asian Language Information Processing (TALIP)
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
Determining the fitness of a document model by using conflict instances

ADC '05 Proceedings of the 16th Australasian database conference - Volume 39
A Hierarchical Neural Network Document Classifier with Linguistic Feature Selection

Applied Intelligence
Applying semantic links for classifying web pages

IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
Poisson naive Bayes for text classification with feature weighting

AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Contextual search and name disambiguation in email using graphs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Some Effective Techniques for Naive Bayes Text Classification

IEEE Transactions on Knowledge and Data Engineering
Joint categorization of queries and clips for web-based video search

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
An intelligent web-page classifier with fair feature-subset selection

Engineering Applications of Artificial Intelligence
Fuzzy support vector machine for multi-class text categorization

Information Processing and Management: an International Journal
Intrusion detection in web applications using text mining

Engineering Applications of Artificial Intelligence
Scalable document classification

Intelligent Data Analysis
Accommodating Individual Preferences in the Categorization of Documents: A Personalized Clustering Approach

Journal of Management Information Systems
Information-theoretic semantic multimedia indexing

Proceedings of the 6th ACM international conference on Image and video retrieval
Combining Subclassifiers in Text Categorization: A DST-Based Solution and a Case Study

IEEE Transactions on Knowledge and Data Engineering
Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method

Expert Systems with Applications: An International Journal
Performance of KNN and SVM classifiers on full word Arabic articles

Advanced Engineering Informatics
Designing evolving user profile in e-CRM with dynamic clustering of Web documents

Data & Knowledge Engineering
A weight-based approach to information retrieval and relevance feedback

Expert Systems with Applications: An International Journal
Automatic classification of security messages based on text categorization

NOTERE '08 Proceedings of the 8th international conference on New technologies in distributed systems
CWC: A Clustering-Based Feature Weighting Approach for Text Classification

MDAI '07 Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence
Automatic Hidden Web Database Classification

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Improving Text Summarization Using Noun Retrieval Techniques

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
INDUCTION FROM MULTI-LABEL EXAMPLES IN INFORMATION RETRIEVAL SYSTEMS: A CASE STUDY

Applied Artificial Intelligence
Two novel feature selection approaches for web page classification

Expert Systems with Applications: An International Journal
Gather customer concerns from online product reviews - A text summarization approach

Expert Systems with Applications: An International Journal
An efficient document classification model using an improved back propagation neural network and singular value decomposition

Expert Systems with Applications: An International Journal
Feature selection for text classification with Naïve Bayes

Expert Systems with Applications: An International Journal
Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
An adaptive personalized news dissemination system

Journal of Intelligent Information Systems
Combination of modified BPNN algorithms and an efficient feature selection method for text categorization

Information Processing and Management: an International Journal
An automatically constructed thesaurus for neural network based document categorization

Expert Systems with Applications: An International Journal
Efficient rule based structural algorithms for classification of tree structured data

Intelligent Data Analysis
Overcoming small-size training set problem in content-based recommendation: a collaboration-based training set expansion approach

Proceedings of the 11th International Conference on Electronic Commerce
Using text classifiers for numerical classification

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Title generation for machine-translated documents

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Data mining based Bayesian networks for best classification

Computational Statistics & Data Analysis
Extraction of unexpected sentences: A sentiment classification assessed approach

Intelligent Data Analysis
Feature reinforcement approach to poly-lingual text categorization

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Maximum entropy modeling with feature selection for text categorization

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Noun retrieval effect on text summarization and delivery of personalized news articles to the user's desktop

Data & Knowledge Engineering
Text and hypertext categorization

Artificial intelligence
Mining Query Logs: Turning Search Usage Data into Knowledge

Foundations and Trends in Information Retrieval
Improving annotation categorization performance through integrated social annotation computation

Expert Systems with Applications: An International Journal
An information-theoretic framework for semantic-multimedia retrieval

ACM Transactions on Information Systems (TOIS)
Three new feature weighting methods for text categorization

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Automatic classification of medical reports, the CIREA project

TELE-INFO'06 Proceedings of the 5th WSEAS international conference on Telecommunications and informatics
Improving text classification with concept index terms and expansion terms

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Cross-lingual text categorization: Conquering language boundaries in globalized environments

Information Processing and Management: an International Journal
FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors

Expert Systems with Applications: An International Journal
Attention-sensitive alerting

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Fast text categorization based on a novel class space model

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Oscillating feature subset search algorithm for text categorization

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Intrusion detection using text mining in a web-based telemedicine system

AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
An adaptive fuzzy kNN text classifier

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part III
A new inductive learning method for multilabel text categorization

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Association classification based on sample weighting

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part II
Text similarity computing based on standard deviation

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Acquire job opportunities for chinese disabled persons based on improved text classification

ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
The use of bayesian networks for subgrouping heterogeneous diseases

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Class normalization in centroid-based text categorization

Information Sciences: an International Journal
Semi-automatic creation and maintenance of web resources with webtopic

EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Research on hand language video retrieval

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A cost-sensitive technique for positive-example learning supporting content-based product recommendations in B-to-C e-commerce

Decision Support Systems
Network vulnerability analysis using text mining

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part II
EA-Analyzer: automating conflict detection in a large set of textual aspect-oriented requirements

Automated Software Engineering
Class-indexing-based term weighting for automatic text classification

Information Sciences: an International Journal
Automatic text classification to support systematic reviews in medicine

Expert Systems with Applications: An International Journal
Exploiting poly-lingual documents for improving text categorization effectiveness

Decision Support Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

A unified model for text categorization and text retrieval is introduced. We use a training set of manually categorized documents to learn word-category associations, and use these associations to predict the categories of arbitrary documents. Similarly, we use a training set of queries and their related documents to obtain empirical associations between query words and indexing terms of documents, and use these associations to predict the related documents of arbitrary queries. A Linear Least Squares Fit (LLSF) technique is employed to estimate the likelihood of these associations. Document collections from the MEDLINE database and Mayo patient records are used for studies on the effectiveness of our approach, and on how much the effectiveness depends on the choices of training data, indexing language, word-weighting scheme, and morphological canonicalization. Alternative methods are also tested on these data collections for comparison. It is evident that the LLSF approach uses the relevance information effectively within human decisions of categorization and retrieval, and achieves a semantic mapping of free texts to their representations in an indexing language. Such a semantic mapping lead to a significant improvement in categorization and retrieval, compared to alternative approaches.