A survey of information retrieval and filtering methods
A survey of information retrieval and filtering methods
Computational experience on four algorithms for the hard clustering problem
Pattern Recognition Letters
A fast fixed-point algorithm for independent component analysis
Neural Computation
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Independent component analysis: algorithms and applications
Neural Networks
Modern Information Retrieval
Topic Identification in Dynamical Text by Complexity Pursuit
Neural Processing Letters
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
XML Clustering by Principal Component Analysis
ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
A tree-based approach to clustering XML documents by structure
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
IEEE Transactions on Pattern Analysis and Machine Intelligence
Return specification inference and result clustering for keyword search on XML
ACM Transactions on Database Systems (TODS)
Improving XML search by generating and utilizing informative result snippets
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
When XML documents are clustered, the high dimensionality problem will occur. Independent Component Analysis (ICA) can reduce dimensionality and in the meanwhile find the underlying latent variables of XML structures to improve the quality of the clustering. This paper proposes a novel strategy to cluster XML documents based on ICA. According to D_path extracted from XML trees, the document was at first represented as Vector Space Model (VSM).Then ICA is applied to reduce the dimensionality of document vectors. Furthermore, document vectors are clustered on this reduced Euclidean Space spanned by the independent components. The experiments show that ICA can enhance the accuracy of the clustering with stable performance.