N-gram inverted index structures on music data for theme mining and content-based information retrieval

  • Authors:
  • Chaokun Wang;Jianzhong Li;Shengfei Shi

  • Affiliations:
  • Department of Computer Science and Engineering, Harbin Institute of Technology, P.O. Box 318, No. 92, West DaZhi Street, 150001 Harbin, Heilongjiang, China;Department of Computer Science and Engineering, Harbin Institute of Technology, P.O. Box 318, No. 92, West DaZhi Street, 150001 Harbin, Heilongjiang, China;Department of Computer Science and Engineering, Harbin Institute of Technology, P.O. Box 318, No. 92, West DaZhi Street, 150001 Harbin, Heilongjiang, China

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2006

Quantified Score

Hi-index 0.10

Visualization

Abstract

Content-based music information retrieval and theme mining are two key problems in digital music information systems, where ''themes'' mean the longest-repeating patterns in a piece of music. However, most data structures constructed for retrieving music data cannot be efficiently used to mine the themes of music pieces, and vice versa. The suffix tree structure can be used for both functions, nevertheless its size is too large and its maintenance is somewhat difficult. In this paper, a kind of index structure is introduced, which adopts the idea of inverted files and that of n-gram. It can be used to retrieve music data as well as to mine music themes. Based on the index and several useful concepts, a theme mining algorithm is proposed, and the theoretical analysis is also given. In addition, two implementations of a content-based music information retrieval algorithm are presented. Experiments show the correctness and efficiency of the proposed index and algorithms.