Semantic scoring based on small-world phenomenon for feature selection in text mining

Authors:
Chong Huang;Yonghong Tian;Tiejun Huang;Wen Gao
Affiliations:
Graduate School, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Graduate School, Chinese Academy of Sciences, Beijing, China;Graduate School, Chinese Academy of Sciences, Beijing, China
Venue:
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Year:
2006

Citing 4
Cited 0

Knowledge-based metadata extraction from PostScript files

DL '00 Proceedings of the fifth ACM conference on Digital libraries
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Towards context sensitive information inference

Journal of the American Society for Information Science and Technology - Mathematical, logical, and formal methods in information retrieval
Title extraction from bodies of HTML documents and its application to web page retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an effective scoring scheme for feature selection in Text Mining, using characteristics of Small-World Phenomenon on the semantic networks of documents. Our focus is on the reservation of both syntactic and statistical information of words, rather than solely simple frequency summarization in prevailing scoring schemes, such as TFIDF. Experimental results on TREC dataset show that our scoring scheme outperforms the prevailing schemes.