Association pattern mining for product specification integration

Authors:
Jyh-Jong Tsay;Chin-Wen Tsay;Ping-Hong Chen
Affiliations:
National Chung Cheng University, Department of Computer Science and Information Engineering, Chiayi, Taiwan, ROC.;National Chung Cheng University, Department of Computer Science and Information Engineering, Chiayi, Taiwan, ROC.;National Chung Cheng University, Department of Computer Science and Information Engineering, Chiayi, Taiwan, ROC.
Venue:
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Year:
2009

Citing 7
Cited 0

CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Schema Matching Using Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Making holistic schema matching robust: an ensemble approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A greedy classification algorithm based on association rule

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As there are more and more online stores and shopping sites available on the Web, integration of product and shopping information provided by different sources has become more and more important, and attract attention of recent research in information integration. One of the fundamental problems is to integrate specifications for products of the same type from difference vendors so that they are described in a homogeneous and uniform way. Observe that specifications for products of the same type from different vendors can look quite different. Integration of them is a tedious and error-prone task. In this paper, we formulate product specification integration as the problem of text categorization, and propose an association pattern mining approach that can automatically generate pattern rules for each attribute. Association patterns are mined from n-grams generated from product specifications. However; mining of association patterns from n-grams can be very time inefficient as any substrings of a frequent string is also frequent. We propose substring pruning strategies that are specific to text data to improve the running time. Experiment shows that our approach is very time-efficient, and achieves classification accuracy higher than 0.95 for data sets collected for digital cameras, notebook PCs, and LCDs.