A novel dual wing harmonium model aided by 2-D wavelet transform subbands for document data mining

  • Authors:
  • Haijun Zhang;Tommy W. S. Chow;M. K. M. Rahman

  • Affiliations:
  • Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 12.05

Visualization

Abstract

A novel dual wing harmonium model that integrates multiple features including term frequency features and 2-D wavelet transform features into a low dimensional semantic space is proposed for the applications of document classification and retrieval. Terms are extracted from the graph representation of document by employing weighted feature extraction method. 2-D wavelet transform is used to compress the graph due to its sparseness while preserving the basic document structure. After transform, low-pass subbands are stacked to represent the term associations in a document. We then develop a new dual wing harmonium model projecting these multiple features into low dimensional latent topics with different probability distributions assumption. Contrastive divergence algorithm is used for efficient learning and inference. We perform extensive experimental verification in document classification and retrieval, and comparative results suggest that the proposed method delivers better performance than other methods.