An Efficient Technique for Mining Approximately Frequent Substring Patterns

Authors:
Xiaonan Ji;James Bailey
Affiliations:
-;-
Venue:
ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Year:
2007

Citing 0
Cited 1

Mining subtopics from text fragments for a web query

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential patterns are used to discover knowledge in a wide range of applications. However, in many scenar- ios pattern quality can be low, due to short lengths or low supports. Furthermore, for dense datasets such as proteins, most of the sequential pattern mining algorithms return a tremendously large number of patterns, which are difficult to process and analyze. However, by relaxing the defini- tion of frequency and allowing some mismatches, it is pos- sible to discover higher quality patterns. We call these pat- terns Frequent Approximate Substrings or FAS-patterns and we introduce an algorithm called FAS-Miner, to handle the mining task efficiently. The experiments on real-world pro- tein and DNA datasets show that FAS-Miner can discover patterns of much longer lengths and higher supports than standard sequential mining approaches.