Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Discrete-time signal processing (2nd ed.)
Discrete-time signal processing (2nd ed.)
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
A Course in Digital Signal Processing
A Course in Digital Signal Processing
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Kernels for Semi-Structured Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Xyleme: A Dynamic Warehouse for XML Data of the Web
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation
CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An Efficient and Scalable Algorithm for Clustering XML Documents by Structure
IEEE Transactions on Knowledge and Data Engineering
Information Systems - Special issue on web data integration
Integration of transient Web services into a virtual peer to peer Web service registry
Distributed and Parallel Databases
Exploiting structural similarity for effective Web information extraction
Data & Knowledge Engineering
XML schema clustering with semantic and hierarchical similarity measures
Knowledge-Based Systems
RFID data management for effective objects tracking
Proceedings of the 2007 ACM symposium on Applied computing
Proceedings of the 2007 ACM symposium on Document engineering
Measuring the structural similarity of semistructured documents using entropy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A heuristic algorithm for clustering rooted ordered trees
Intelligent Data Analysis
XML document similarity measure in terms of the structure and contents
CEA'08 Proceedings of the 2nd WSEAS International Conference on Computer Engineering and Applications
Multilevel Conditional Fuzzy C-Means Clustering of XML Documents
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Expert Systems with Applications: An International Journal
Propagation-vectors for trees (PVT): concise yet effective summaries for hierarchical data and trees
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Using Wavelets to Classify Documents
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
On the functional quality of service (FQoS) to discover and compose interoperable web services
Expert Systems with Applications: An International Journal
Process of applying data mining techniques to XML data
Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
ICWE '9 Proceedings of the 9th International Conference on Web Engineering
Data Discovery and Related Factors of Documents on the Web and the Network
ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part I
Proceedings of the 13th International Conference on Human-Computer Interaction. Part IV: Interacting in Various Application Domains
A cluster-based approach to XML similarity joins
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Semantic clustering of XML documents
ACM Transactions on Information Systems (TOIS)
The pq-gram distance between ordered labeled trees
ACM Transactions on Database Systems (TODS)
Semantic-based Merging of RSS Items
World Wide Web
A kernel method for measuring structural similarity between XML documents
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Semantics-guided clustering of heterogeneous XML schemas
Journal on data semantics IX
Approximate variable-length time series motif discovery using grammar inference
Proceedings of the Tenth International Workshop on Multimedia Data Mining
Generation of synthetic XML for evaluation of hybrid XML systems
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
XML data clustering: An overview
ACM Computing Surveys (CSUR)
WSEAS Transactions on Computers
Group SAX: extending the notion of contrast sets to time series and multimedia data
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
XCLS: a fast and effective clustering algorithm for heterogenous XML documents
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
XML documents clustering by structures
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
SMART: Stream Monitoring enterprise Activities by RFID Tags
Information Sciences: an International Journal
Clustering XML documents by structure
ADBIS'09 Proceedings of the 13th East European conference on Advances in Databases and Information Systems
Proceedings of the 16th International Database Engineering & Applications Sysmposium
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
Improving XML instances comparison with preprocessing algorithms
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
X-Class: Associative Classification of XML Documents by Structure
ACM Transactions on Information Systems (TOIS)
Hierarchical clustering of XML documents focused on structural components
Data & Knowledge Engineering
Effectively grouping trajectory streams
NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns
Information Systems
Fractal self-similarity measurements based clustering technique for SOAP Web messages
Journal of Parallel and Distributed Computing
Temporal and multi-versioned XML documents: A survey
Information Processing and Management: an International Journal
Analysing microarray expression data through effective clustering
Information Sciences: an International Journal
Dealing with trajectory streams by clustering and mathematical transforms
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Because of the widespread diffusion of semistructured data in XML format, much research effort is currently devoted to support the storage and retrieval of large collections of such documents. XML documents can be compared as to their structural similarity, in order to group them into clusters so that different storage, retrieval, and processing techniques can be effectively exploited. In this scenario, an efficient and effective similarity function is the key of a successful data management process. We present an approach for detecting structural similarity between XML documents which significantly differs from standard methods based on graph-matching algorithms, and allows a significant reduction of the required computation costs. Our proposal roughly consists of linearizing the structure of each XML document, by representing it as a numerical sequence and, then, comparing such sequences through the analysis of their frequencies. First, some basic strategies for encoding a document are proposed, which can focus on diverse structural facets. Moreover, the theory of Discrete Fourier Transform is exploited to effectively and efficiently compare the encoded documents (i.e., signals) in the domain of frequencies. Experimental results reveal the effectiveness of the approach, also in comparison with standard methods.