Feature enrichment and selection for transductive classification on networked data

  • Authors:
  • Zehra Cataltepe;Abdullah Sonmez;Baris Senliol

  • Affiliations:
  • -;-;-

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2014

Quantified Score

Hi-index 0.10

Visualization

Abstract

Networked data consist of nodes and links between the nodes which indicate their dependencies. Nodes have content features which are available for all the data; on the other hand, the labels are available only for the training data. Given the features for all the nodes and labels for training nodes, in transductive classification, labels for all remaining nodes are predicted. Learning algorithms that use both node content features and links have been developed. For example, collective classification algorithms use aggregated (such as sum or average of) labels of neighbors, in addition to node features, as inputs to a classifier. The classifier is trained using the training data only. When testing, since the neighbors' labels are used as classifier inputs, the labels for the test set need to be determined through an iterative procedure. While it is usually very difficult to obtain labels on the whole dataset, features are usually easier to obtain. In this paper, we introduce a new method of transductive network classification which can use the test node features when training the classifier. We train our classifier using enriched node features. The enriched node features include, in addition to the node's own features, the aggregated neighbors' features and aggregation of node and neighbor features passed through simple logical operators OR and AND. Enriched features may contain irrelevant or redundant features, which could decrease classifier performance. Therefore, we employ feature selection to determine whether a feature among the set of enriched features should be used for classifier training or not. Our feature selection method, called FCBF#, is a mutual information based, filter type, fast, feature selection method. Experimental results on three different network datasets show that classification accuracies obtained using network enriched and selected features are comparable or better than content only or collective classification.