Random-walk term weighting for improved text classification

Authors:
Samer Hassan;Carmen Banea
Affiliations:
University of North Texas, Denton, TX;University of North Texas, Denton, TX
Venue:
TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Year:
2006

Citing 13
Cited 7

Classifying news stories using memory based reasoning

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Towards language independent automated learning of text categorization models

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based text categorization: a comparison of category search strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Graph-based ranking algorithms for e-mail expertise analysis

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
A new feature selection score for multinomial naive Bayes text classification based on KL-divergence

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions

Random walk term weighting for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptable term weighting framework for text classification

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Grammatical dependency-based relations for term weighting in text classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Graph-based term weighting for information retrieval

Information Retrieval
Nonlinear transformation of term frequencies for term weighting in text categorization

Engineering Applications of Artificial Intelligence
Utilization of global ranking information in graph-based biomedical literature clustering

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
A new term ranking method based on relation extraction and graph model for text classification

ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a new approach for estimating term weights in a text classification task. The approach uses term co-occurrence as a measure of dependency between word features. A random walk model is applied on a graph encoding words and co-occurrence dependencies, resulting in scores that represent a quantification of how a particular word feature contributes to a given context. We argue that by modeling feature weights using these scores, as opposed to the traditional frequency-based scores, we can achieve better results in a text classification task. Experiments performed on four standard classification datasets show that the new random-walk based approach outperforms the traditional term frequency approach to feature weighting.