Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Infomaster: an information integration system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast discovery of association rules
Advances in knowledge discovery and data mining
Communication-efficient distributed mining of association rules
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures
IEEE Parallel & Distributed Technology: Systems & Technology
Parallel and Distributed Association Mining: A Survey
IEEE Concurrency
Parallel Mining of Association Rules
IEEE Transactions on Knowledge and Data Engineering
The Nimble XML Data Integration System
Proceedings of the 17th International Conference on Data Engineering
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Privacy preserving mining of association rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving association rule mining in vertically partitioned data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatically detecting deceptive criminal identities
Communications of the ACM - Homeland security
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Building predictors from vertically distributed data
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Journal of the American Society for Information Science and Technology - Intelligence and Security Informatics
Data Mining
ODAM: An Optimized Distributed Association Rule Mining Algorithm
IEEE Distributed Systems Online
Distributed higher-order text mining: theory and practice
dg.o '06 Proceedings of the 2006 international conference on Digital government research
ACM SIGKDD Explorations Newsletter
Trace Mining from Distributed Assembly Databases for Causal Analysis
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Leveraging Higher Order Dependencies between Features for Text Classification
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
CLAP: Collaborative pattern mining for distributed information systems
Decision Support Systems
A Knowledge Mining Approach for Effective Customer Relationship Management
International Journal of Knowledge-Based Organizations
Hi-index | 0.00 |
The burgconing amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective distributed information extraction and mining techniques. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, assume that the databases are either horizontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two. In this article we present D-HOTM, a framework for Distributed Higher Order Text Mining. D-HOTM is a hybrid approach that combines information extraction and distributed data mining. We employ a novel information extraction technique to extract meaningful entities from unstructured text in a distributed environment. The information extracted is stored in local databases and a mapping function is applied to identify globally unique keys. Based on the extracted information, a novel distributed association rule mining algorithm is applied to discover higher-order associations between items (i.e., entities) in records fragmented across the distributed databases using the keys. Unlike existing algorithms, D-HOTM requires neither knowledge of a global schema nor that the distribution of data be horizontal or vertical. Evaluation methods are proposed to incorporate the performance of the mapping function into the traditional support metric used in ARM evaluation. An example application of the algorithm on distributed law enforcement data demonstrates the relevance of D-HOTM in the fight against terrorism.