Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Unlabeled Data Can Degrade Classification Performance of Generative Classifiers
Proceedings of the Fifteenth International Florida Artificial Intelligence Research Society Conference
An introduction to variable and feature selection
The Journal of Machine Learning Research
Cluster-based concept invention for statistical relational learning
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classification in Networked Data: A Toolkit and a Univariate Case Study
The Journal of Machine Learning Research
Graph-Based Semisupervised Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Analyzing Co-training Style Algorithms
ECML '07 Proceedings of the 18th European conference on Machine Learning
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Co-training with relevant random subspaces
Neurocomputing
Graph regularized transductive classification on heterogeneous information networks
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Collective classification using heterogeneous classifiers
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Hi-index | 0.10 |
Networked data consist of nodes and links between the nodes which indicate their dependencies. Nodes have content features which are available for all the data; on the other hand, the labels are available only for the training data. Given the features for all the nodes and labels for training nodes, in transductive classification, labels for all remaining nodes are predicted. Learning algorithms that use both node content features and links have been developed. For example, collective classification algorithms use aggregated (such as sum or average of) labels of neighbors, in addition to node features, as inputs to a classifier. The classifier is trained using the training data only. When testing, since the neighbors' labels are used as classifier inputs, the labels for the test set need to be determined through an iterative procedure. While it is usually very difficult to obtain labels on the whole dataset, features are usually easier to obtain. In this paper, we introduce a new method of transductive network classification which can use the test node features when training the classifier. We train our classifier using enriched node features. The enriched node features include, in addition to the node's own features, the aggregated neighbors' features and aggregation of node and neighbor features passed through simple logical operators OR and AND. Enriched features may contain irrelevant or redundant features, which could decrease classifier performance. Therefore, we employ feature selection to determine whether a feature among the set of enriched features should be used for classifier training or not. Our feature selection method, called FCBF#, is a mutual information based, filter type, fast, feature selection method. Experimental results on three different network datasets show that classification accuracies obtained using network enriched and selected features are comparable or better than content only or collective classification.