Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Stochastic Grammatical Inference of Text Database Structure
Machine Learning
Concept-based knowledge discovery in texts extracted from the Web
ACM SIGKDD Explorations Newsletter
Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales
Medical Data Mining and Knowledge Discovery
Medical Data Mining and Knowledge Discovery
Discovering Structural Association of Semistructured Data
IEEE Transactions on Knowledge and Data Engineering
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
GETESS: Constructing a Linguistic Search Index for an Internet Search Engine
NLDB '00 Proceedings of the 5th International Conference on Applications of Natural Language to Information Systems-Revised Papers
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Schema Mining: Finding Structural Regularity among Semistructured Data
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
From manual to semi-automatic semantic annotation: about ontology-based text annotation tools
Proceedings of the COLING-2000 Workshop on Semantic Annotation and Intelligent Content
Expanding the taxonomies of bibliographic archives with persistent long-term themes
Proceedings of the 2006 ACM symposium on Applied computing
RELFIN – topic discovery for ontology enhancement and annotation
ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
Hi-index | 0.00 |
Domain-specific documents often share an inherent, though undocumented structure. This structure should be made explicit to facilitate efficient, structure-based search in archives as well as information integration. Inferring a semantically structured XML DTD for an archive and subsequently transforming its texts into XML documents is a promising method to reach these objectives. Based on the KDD-driven DIAs-DEM framework, we propose a new method to derive an archive-specific structured XML document type definition (DTD). Our approach utilizes association rule discovery and sequence mining techniques to structure a previously derived flat, i.e. unstructured DTD. We introduce the notion of a probabilistic DTD that is derived by discovering associations among and frequent sequences of XML tags, respectively.