Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Data mining: concepts and techniques
Data mining: concepts and techniques
Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Online Algorithms for Mining Semi-structured Data Stream
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
XRANK: ranked keyword search over XML documents
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Implementing Logic Wrappers Using XSLT Stylesheets
ICCGI '06 Proceedings of the International Multi-Conference on Computing in the Global Information Technology
XSEarch: a semantic search engine for XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Hi-index | 0.00 |
We consider the online data mining problem of continuously extracting all structural features among words from an infinite sequence of tree structured documents. In order to represent structural features among words appearing in tree structured documents, firstly, we introduce a consecutive path pattern (CPP, for short) on a list of words. A CPP is a sequence of consecutive paths from leaves to leaves. Then, we give a matching function over CPPs with respect to the recent frequency of a CPP, the recency of a CPP and the viewing time of tree structured document in which a CPP appears. Secondly, we present an online algorithm based on a sliding window strategy for extracting continuously all maximal CPPs as characteristic structural features from an infinite sequence of tree structured documents. Finally, by reporting experimental results on our algorithm, we show the good performance of our algorithm.