HLT '90 Proceedings of the workshop on Speech and Natural Language
Foundations of statistical natural language processing
Foundations of statistical natural language processing
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Document Clustering Using Locality Preserving Indexing
IEEE Transactions on Knowledge and Data Engineering
An Algorithm for Finding Intrinsic Dimensionality of Data
IEEE Transactions on Computers
On the Quantization Error in SOM vs. VQ: A Critical and Systematic Study
WSOM '09 Proceedings of the 7th International Workshop on Advances in Self-Organizing Maps
Filaments of meaning in word space
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Measuring the complexity of a collection of documents
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.02 |
In this article, we study the scale-dependent dimensionality properties and overall structure of text data with a method that measures correlation dimension in different scales. As experimental results, we present the analysis of text data sets with the Reuters and Europarl corpora, which are also compared to artificially generated point sets. A comparison is also made with speech data. The results reflect some of the typical properties of the data and the use of our method in improving various data analysis applications is discussed.