Better Rules, Few Features: A Semantic Approach to Selecting Features from Text

Authors:
Catherine Blake;Wanda Pratt
Affiliations:
-;-
Venue:
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Year:
2001

Citing 0
Cited 7

Text mining: generating hypotheses from MEDLINE

Journal of the American Society for Information Science and Technology
Term identification in the biomedical literature

Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Mining semantically related terms from biomedical literature

ACM Transactions on Asian Language Information Processing (TALIP)
Overview and semantic issues of text mining

ACM SIGMOD Record
A new algorithm for term weighting in text summarization process

AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Using text classification and multiple concepts to answer e-mails

Expert Systems with Applications: An International Journal
A Probabilistic SVM Approach to Annotation of Calcification Mammograms

International Journal of Digital Library Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The choice of features used to represent a domain has a profound effect on the quality of the model produced; yet, few researchers have investigated the relationship between the features used to represent text and the quality of the final model. We explored this relationship formedical texts by comparing association rules based on features with three different semantic levels: (1) words (2) manually assigned keywords and (3) automatically selected medical concepts. Our preliminary findings indicate that bi-directional association rules based onconcepts or keywords are more plausible and more useful than those based on word features. The concept and keyword representations also required 90% fewer features than the word representation. This drastic dimensionality reduction suggests that this approach is well suited to large textual corpus of medical text, such as parts of the Web.