Fast parallel and serial approximate string matching
Journal of Algorithms
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Query processing for distance metrics
Proceedings of the sixteenth international conference on Very large databases
New techniques for best-match retrieval
ACM Transactions on Information Systems (TOIS)
Data Engineering - Special issue on directions for future DBMS research and development
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Statistical estimators for aggregate relational algebra queries
ACM Transactions on Database Systems (TODS)
Database systems: achievements and opportunities
Communications of the ACM
The human genome project and informatics
Communications of the ACM
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Fast text searching: allowing errors
Communications of the ACM
Artificial intelligence and molecular biology
Artificial intelligence and molecular biology
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Approximate tree matching in the presence of variable length don't cares
Journal of Algorithms
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
The String-to-String Correction Problem
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Mathematical Methods for DNA Sequences
Mathematical Methods for DNA Sequences
A System for Approximate Tree Matching
IEEE Transactions on Knowledge and Data Engineering
Abstract-Driven Pattern Discovery in Databases
IEEE Transactions on Knowledge and Data Engineering
FLASH: A Fast Look-Up Algorithm for String Homology
Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Data and Knowledge Bases for Genome Mapping: What Lies Ahead? (Panel)
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
An Interval Classifier for Database Mining Applications
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Knowledge Discovery in Databases: An Attribute-Oriented Approach
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Ordered Types in the AQUA Data Model
DBLP-4 Proceedings of the Fourth International Workshop on Database Programming Languages - Object Models and Languages
Pattern matching and pattern discovery in scientific, program, and document databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Estimating alphanumeric selectivity in the presence of wildcards
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Discovering Patterns from Large and Dynamic Sequential Data
Journal of Intelligent Information Systems
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Motif discovery without alignment or enumeration (extended abstract)
RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
Adaptive query processing for time-series data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Rule Discovery in Telecommunication AlarmData
Journal of Network and Systems Management
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable data mining with model constraints
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Multi-dimensional sequential pattern mining
Proceedings of the tenth international conference on Information and knowledge management
Mining long sequential patterns in a noisy environment
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
Discovery of frequent DATALOG patterns
Data Mining and Knowledge Discovery
Mining hybrid sequential patterns and sequential rules
Information Systems
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules
IEEE Transactions on Knowledge and Data Engineering
Efficient Data Mining for Path Traversal Patterns
IEEE Transactions on Knowledge and Data Engineering
Discovering Frequent Event Patterns with Multiple Granularities in Time Sequences
IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns with Regular Expression Constraints
IEEE Transactions on Knowledge and Data Engineering
Finding Patterns in Three-Dimensional Graphs: Algorithms and Applications to Scientific Data Mining
IEEE Transactions on Knowledge and Data Engineering
Interactive Query Formulation in Semistructured Databases
FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
Distance and Feature-Based Clustering of Time Series: An Application on Neurophysiology
SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Discovering and Matching Elastic Rules from Sequence Databases
ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Efficient Graph-Based Algorithm for Discovering and Maintaining Knowledge in Large Databases
PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
Discovering Unordered and Ordered Phrase Association Patterns for Text Mining
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Finding Best Patterns Practically
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Some Results on Flexible-Pattern Discovery
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
An Output-Sensitive Flexible Pattern Discovery Algorithm
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Efficient Discovery of Proximity Patterns with Suffix Arrays
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Efficient Text Mining with Optimized Pattern Discovery
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases
DS '98 Proceedings of the First International Conference on Discovery Science
A Practical Algorithm to Find the Best Subsequence Patterns
DS '00 Proceedings of the Third International Conference on Discovery Science
Mining Semi-structured Data by Path Expressions
DS '01 Proceedings of the 4th International Conference on Discovery Science
Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words
ISAAC '98 Proceedings of the 9th International Symposium on Algorithms and Computation
A practical algorithm to find the best subsequence patterns
Theoretical Computer Science
A template model for multidimensional inter-transactional association rules
The VLDB Journal — The International Journal on Very Large Data Bases
Similarity search of time-warped subsequences via a suffix tree
Information Systems
Frequent-subsequence-based prediction of outer membrane proteins
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Surprising Periodic Patterns
Data Mining and Knowledge Discovery
New techniques for extracting features from protein sequences
IBM Systems Journal - Deep computing for the life sciences
From sequential pattern mining to structured pattern mining: a pattern-growth approach
Journal of Computer Science and Technology
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach
IEEE Transactions on Knowledge and Data Engineering
Scalable sequential pattern mining for biological sequences
Proceedings of the thirteenth ACM international conference on Information and knowledge management
An inexact-suffix-tree-based algorithm for detecting extensible patterns
Theoretical Computer Science - Pattern discovery in the post genome
Information Sciences—Informatics and Computer Science: An International Journal
Localization Site Prediction for Membrane Proteins by Integrating Rule and SVM Classification
IEEE Transactions on Knowledge and Data Engineering
Data mining with the SAP NetWeaver BI accelerator
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Discovering and Matching Elastic Rules from Sequence Databases
Fundamenta Informaticae - Intelligent Systems
Efficient strategies for tough aggregate constraint-based sequential pattern mining
Information Sciences: an International Journal
Mining fuzzy temporal patterns from process instances with weighted temporal graphs
International Journal of Data Analysis Techniques and Strategies
Discovering subword associations in strings in time linear in the output size
Journal of Discrete Algorithms
Information Sciences: an International Journal
Finding event-oriented patterns in long temporal sequences
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
An efficient polynomial delay algorithm for pseudo frequent itemset mining
DS'07 Proceedings of the 10th international conference on Discovery science
Ambiguous frequent itemset mining and polynomial delay enumeration
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining closed episodes with simultaneous events
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Frequent subsequence-based protein localization
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Data Mining and Knowledge Discovery
Discovering and Matching Elastic Rules from Sequence Databases
Fundamenta Informaticae - Intelligent Systems
Discovering metric temporal constraint networks on temporal databases
Artificial Intelligence in Medicine
Hi-index | 0.00 |
Suppose you are given a set of natural entities (e.g., proteins, organisms, weather patterns, etc.) that possess some important common externally observable properties. You also have a structural description of the entities (e.g., sequence, topological, or geometrical data) and a distance metric. Combinatorial pattern discovery is the activity of finding patterns in the structural data that might explain these common properties based on the metric.This paper presents an example of combinatorial pattern discovery: the discovery of patterns in protein databases. The structural representation we consider are strings and the distance metric is string edit distance permitting variable length don't cares. Our techniques incorporate string matching algorithms and novel heuristics for discovery and optimization, most of which generalize to other combinatorial structures. Experimental results of applying the techniques to both generated data and functionally related protein families obtained from the Cold Spring Harbor Laboratory show the effectiveness of the proposed techniques. When we apply the discovered patterns to perform protein classification, they give information that is complementary to the best protein classifier available today.