Hyperpaths: extending pathfinding to moded languages

Authors:
Irene M. Ong;David Page;Inês Dutra;Vítor Santos Costa
Affiliations:
University of Wisconsin -- Madison, Madison, WI;University of Wisconsin -- Madison, Madison, WI;Centro de Tecnologia, Rio de Janeiro, Brasil;Centro de Tecnologia, Rio de Janeiro, Brasil
Venue:
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
Year:
2005

Citing 1
Cited 0

Link mining: a new data mining challenge

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning in multi-relational domains has gained in popularity over the past few years, contributing to applications in diverse areas. Typically, learning from multi-relational domains has involved learning rules about distinct entities so that they can be classified into one category or another. However, there are also interesting applications that are concerned with the problem of learning whether a number of entities are connected. Examples of these include determining whether two proteins interact in a cell, whether two identifiers are aliases, or whether a web page will refer another one; these are known as link mining [3].Inductive logic programming (ILP) systems, which often rely on hill-climbing heuristics in learning first-order concepts, have been a dominating force in the area of multi relational concept learning. However, hill-climbing heuristics are susceptible to local maxima and plateaus, which is especially a factor for large datasets where the branching factor per node can be very large [2, 1]. Ideally, saturation based search and a good scoring method should eventually lead us to the interesting clauses, however, the search space can grow so quickly that we risk never reaching an interesting path in a reasonable amount of time. This prompted us to consider alternative ways, such as pathfinding [4], to constrain the search space.Richards and Mooney realized that the problem of learning first-order concepts could be represented using graphs, and using the intuition that if two nodes interact there must exist an explanation, proposed that the explanation should be a connected path linking the two nodes. We agree with the idea and propose to use pathfinding on the saturated clause instead. The original pathfinding algorithm assumes the background knowledge forms an undirected graph. In contrast, the saturated clause is obtained by using mode declarations: in a nutshell, a literal can only be added to a clause if the literal's input variables are known to be bound. Mode declarations thus embed directionality in the graph formed by literals.We show how we can exploit the links between objects in multi-relational data to help a first-order rule learning system to direct the search by explicitly traversing these links to find paths between variables of interest. Specifically, we extend the pathfinding algorithm by Richards and Mooney [4] to make use of mode declarations to find paths in the saturated bottom clause, which anchor one end of the search space based on background knowledge.Our major insight is that a saturated clause for a moded program can be described as a directed hypergraph, which consists of nodes and hyperarcs that connect a nonempty set of nodes to one target node. Given this, we show that path finding can be reduced to reachability in the hypergraph, whereby each hyperpath will correspond to a hypothesis. However, we may be interested in non-minimal paths and in the composition of paths. We thus propose and evaluate an algorithm that can enumerate all such hyperpaths according to some heuristic and test it on the UW-CSE dataset by Richardson and Domingos [5]. Experimental results on a medium sized dataset show that path finding allows one to consider interesting clauses that would not easily be found by Aleph.