Semantic scoring based on small-world phenomenon for feature selection in text mining

  • Authors:
  • Chong Huang;Yonghong Tian;Tiejun Huang;Wen Gao

  • Affiliations:
  • Graduate School, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Graduate School, Chinese Academy of Sciences, Beijing, China;Graduate School, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an effective scoring scheme for feature selection in Text Mining, using characteristics of Small-World Phenomenon on the semantic networks of documents. Our focus is on the reservation of both syntactic and statistical information of words, rather than solely simple frequency summarization in prevailing scoring schemes, such as TFIDF. Experimental results on TREC dataset show that our scoring scheme outperforms the prevailing schemes.