Experimental study on the extraction and distribution of textual domain keywords

  • Authors:
  • Xiangfeng Luo;Ning Fang;Weimin Xu;Sheng Yu;Kai Yan;Huizhe Xiao

  • Affiliations:
  • Digital Content Computing and Semantic Grid Group, Key Lab of Grid Technology, Shanghai University, Shanghai 200072, China;Digital Content Computing and Semantic Grid Group, Key Lab of Grid Technology, Shanghai University, Shanghai 200072, China;Digital Content Computing and Semantic Grid Group, Key Lab of Grid Technology, Shanghai University, Shanghai 200072, China;Digital Content Computing and Semantic Grid Group, Key Lab of Grid Technology, Shanghai University, Shanghai 200072, China;Digital Content Computing and Semantic Grid Group, Key Lab of Grid Technology, Shanghai University, Shanghai 200072, China;Digital Content Computing and Semantic Grid Group, Key Lab of Grid Technology, Shanghai University, Shanghai 200072, China

  • Venue:
  • Concurrency and Computation: Practice & Experience
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Domain keywords of text play a primary role in text classifying, clustering and personalized services. This paper proposes a term frequency inverse document frequency (TFIDF) based method called TDDF (TFIDF direct document frequency of domain) to extract domain keywords from multi-texts. First, we discuss the optimal parameters of TFIDF, which are used to extract textual keywords and domain keywords. Second, TDDF is proposed to extract domain keywords from multi-texts, which takes document frequency of domain into account. Finally, the distribution of domain keywords on scientific texts is studied. Experiments and applications show that TDDF is more effective than the optimal TFIDF in the extraction of domain keywords. Domain keywords accord with normal distribution on a single text after deleting the ubiquitous domain keywords. Copyright © 2008 John Wiley & Sons, Ltd.