Theoretical Computer Science - International Symposium on Mathematical Foundations of Computer Science, Bratisl
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Discovering Structural Association of Semistructured Data
IEEE Transactions on Knowledge and Data Engineering
Optimizing Regular Path Expressions Using Graph Schemas
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Discovery of Frequent Tree Structured Patterns in Semistructured Web Documents
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Applying Pattern Mining to Web Information Extraction
PAKDD '01 Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining
Extracting Characteristic Structures among Words in Semistructured Documents
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables
FCT '01 Proceedings of the 13th International Symposium on Fundamentals of Computation Theory
Learning of Finite Unions of Tree Patterns with Internal Structured Variables from Queries
AI '02 Proceedings of the 15th Australian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Extracting Characteristic Structures among Words in Semistructured Documents
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Ordered Term Tree Languages which Are Polynomial Time Inductively Inferable from Positive Data
ALT '02 Proceedings of the 13th International Conference on Algorithmic Learning Theory
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model
IEEE Transactions on Knowledge and Data Engineering
Knowledge and Information Systems
Ordered term tree languages which are polynomial time inductively inferable from positive data
Theoretical Computer Science - Algorithmic learning theory(ALT 2002)
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Exact Learning of Finite Unions of Graph Patterns from Queries
ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Evolution of Multiple Tree Structured Patterns from Tree-Structured Data Using Clustering
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Learning of Finite Unions of Tree Patterns with Internal Structured Variables from Queries
IEICE - Transactions on Information and Systems
Mining frequent instances on workflows
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Extraction of tag tree patterns with contractible variables from irregular semistructured data
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
ILP'02 Proceedings of the 12th international conference on Inductive logic programming
A genetic programming approach to extraction of glycan motifs using tree structured patterns
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Efficient algorithms for finding frequent substructures from semi-structured data streams
JSAI'03/JSAI04 Proceedings of the 2003 and 2004 international conference on New frontiers in artificial intelligence
EXiT-B: a new approach for extracting maximal frequent subtrees from XML data
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Extraction of interesting financial information from heterogeneous XML-Based data
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Extracting structural features among words from document data streams
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Evolution of characteristic tree structured patterns from semistructured documents
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Mining frequent association tag sequences for clustering XML documents
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Detecting Irrelevant Subtrees to Improve Probabilistic Learning from Tree-structured Data
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Hi-index | 0.00 |
Many Web documents such as HTML files and XML files have no rigid structure and are called semistructured data. In general, such semistructuredWeb documents are represented by rooted trees with ordered children. We propose a new method for discovering frequent tree structured patterns in semistructured Web documents by using a tag tree pattern as a hypothesis. A tag tree pattern is an edge labeled tree with ordered children which has structured variables. An edge label is a tag or a keyword in such Web documents, and a variable can be substituted by an arbitrary tree. So a tag tree pattern is suited for representing tree structured patterns in such Web documents. First we show that it is hard to compute the optimum frequent tag tree pattern. So we present an algorithm for generating all maximally frequent tag tree patterns and give the correctness of it. Finally, we report some experimental results on our algorithm. Although this algorithm is not efficient, experiments show that we can extract characteristic tree structured patterns in those data.