Semantic document classification and keyword spotting in digital repositories

  • Authors:
  • Nikunj Yadav;Yanu Gupta;Manish Kumar;Ratna Sanyal

  • Affiliations:
  • Indian Institute of Information Technology, Allahabad, India;Indian Institute of Information Technology, Allahabad, India;Indian Institute of Information Technology, Allahabad, India;Indian Institute of Information Technology, Allahabad, India

  • Venue:
  • Proceedings of the International Conference on Management of Emergent Digital EcoSystems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The volume of documents in the digital repositories numbers in thousands and is increasing constantly. In such a scenario it becomes a very important issue to organize and retrieve these documents in a way that relates to the human mind. In this paper, we present a novel approach to classify the documents in a digital repository and find the semantically significant keywords related to those documents to make the organization and the retrieval of the documents expeditious. We approach this problem using probabilistic model with incomplete training data to organize them and mark the relevant keywords. This approach makes the classification faster and instead of the unlabeled clustering gives classification with well defined topics.