Grammatical dependency-based relations for term weighting in text classification

Authors:
Dat Huynh;Dat Tran;Wanli Ma;Dharmendra Sharma
Affiliations:
Faculty of Information Sciences and Engineering, University of Canberra, Australia;Faculty of Information Sciences and Engineering, University of Canberra, Australia;Faculty of Information Sciences and Engineering, University of Canberra, Australia;Faculty of Information Sciences and Engineering, University of Canberra, Australia
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 14
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Applied morphological processing of English

Natural Language Engineering
The Wikipedia XML corpus

ACM SIGIR Forum
Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity

ICSC '07 Proceedings of the International Conference on Semantic Computing
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Improving Text Classification by Using Encyclopedia Knowledge

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Random-walk term weighting for improved text classification

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Term graph model for text classification

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Term frequency and term co-occurrence are currently used to estimate term weightings in a document. However these methods do not employ relations based on grammatical dependency among terms to measure dependency between word features. In this paper, we propose a new approach that employs grammatical relations to estimate weightings of terms in a text document and present how to apply the term weighting scheme to text classification. A graph model is used to encode the extracted relations. A graph centrality algorithm is then applied to calculate scores that represent significance values of the terms in the document context. Experiments performed on many corpora with SVM classifier show that the proposed term weighting approach outperforms those based on term frequency and term co-occurrence.