Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Modeling hidden topics on document manifold
Proceedings of the 17th ACM conference on Information and knowledge management
Probabilistic dyadic data analysis with local and global consistency
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Topic-link LDA: joint models of topic and author community
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Detection of unique temporal segments by information theoretic meta-clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Learning from dyadic and relational data is a fundamental problem for IR and KDD applications in web and social media domain. Basic behaviors and characteristics of users and documents are typically described by a collection of dyads, i.e., pairs of entities. Discriminative features extracted from such data are essential in exploratory and discriminatory analyses. Relational properties of the entities reflect pair-wise similarities and their collective community structure which are also valuable for discriminative learning. A challenging aspect of learning from the relational data in many domains, is that the generative process of relational links appears noisy and is not well described by a stochastic model. In this paper, we present a principled approach for learning discriminative features from heterogeneous sources of dyadic and relational data. We propose an information-theoretic framework called Latent Feature Encoding (LFE) which projects the entities and the links to a latent feature space in the analogy of -encoding. Projection is formalized as a maximization of the mutual information preserved in the latent features, regularized by the compression rate of encoding. The regularization is emphasized over more probable links to account for the noisiness of the observation. An empirical evaluation of the proposed method using text and social media datasets is presented. Performances in supervised and unsupervised learning tasks are compared with those of conventional latent feature extraction methods.