CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Schema Matching Using Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Making holistic schema matching robust: an ensemble approach
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A greedy classification algorithm based on association rule
Applied Soft Computing
Hi-index | 0.00 |
As there are more and more online stores and shopping sites available on the Web, integration of product and shopping information provided by different sources has become more and more important, and attract attention of recent research in information integration. One of the fundamental problems is to integrate specifications for products of the same type from difference vendors so that they are described in a homogeneous and uniform way. Observe that specifications for products of the same type from different vendors can look quite different. Integration of them is a tedious and error-prone task. In this paper, we formulate product specification integration as the problem of text categorization, and propose an association pattern mining approach that can automatically generate pattern rules for each attribute. Association patterns are mined from n-grams generated from product specifications. However; mining of association patterns from n-grams can be very time inefficient as any substrings of a frequent string is also frequent. We propose substring pruning strategies that are specific to text data to improve the running time. Experiment shows that our approach is very time-efficient, and achieves classification accuracy higher than 0.95 for data sets collected for digital cameras, notebook PCs, and LCDs.