Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression

  • Authors:
  • Fabian Moerchen;Michael Thies;Alfred Ultsch

  • Affiliations:
  • Siemens Corporate Research, 755 College Road East, 08540, Princeton, NJ, USA;Philipps-University Marburg, Databionic Research Group, Hans-Meerwein-Str, 35032, Marburg, Germany;Philipps-University Marburg, Databionic Research Group, Hans-Meerwein-Str, 35032, Marburg, Germany

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Margin-closed itemsets have previously been proposed as a subset of the closed itemsets with a minimum margin constraint on the difference in support to supersets. The constraint reduces redundancy in the set of reported patterns favoring longer, more specific patterns. A variety of patterns ranging from rare specific itemsets to frequent general itemsets is reported to support exploratory data analysis and understandable classification models. We present DCI_Margin, a new efficient algorithm that mines the complete set of margin-closed itemsets. We modified the DCI_Closed algorithm that has low memory requirements and can be parallelized. The margin constraint is checked on-the-fly reusing information already computed by DCI_Closed. We thoroughly analyzed the behavior on many datasets and show how other data mining algorithms can benefit from the redundancy reduction.