Applying latent semantic indexing in frequent itemset mining for document relation discovery

  • Authors:
  • Thanaruk Theeramunkong;Kritsada Sriphaew;Manabu Okumura

  • Affiliations:
  • Sirindhorn International Institute of Technology, Thammasat University, Pathumthani, Thailand;Sirindhorn International Institute of Technology, Thammasat University, Pathumthani, Thailand and Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Japan;Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Japan

  • Venue:
  • PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word-based relations among technical documents are immensely useful information but often hidden in a large amount of scientific publications. This work presents a method to apply latent semantic indexing in frequent itemset mining to discover potential relations among scientific publications. In this work, two weighting schemes, tf and tfidf are investigated with the exploitation of latent semantic indexing. The proposed method is evaluated using a set of technical documents in a publication database by comparing the extracted document relations with their references (citations). To this end, the paper uses order accumulative citation matrices to evaluate the validity (quality) of discovered patterns. The results also show that the proposed method successfully discovers a set of document relations, comparing to the original method that uses no latent semantic indexing.