Invited contribution: matching classifications via a bidirectional integration of SAT and linguistic resources

  • Authors:
  • Fausto Giunchiglia

  • Affiliations:
  • Dept. of Information and Communication Technology, University of Trento, Povo, Trento, Italy

  • Venue:
  • FroCoS'05 Proceedings of the 5th international conference on Frontiers of Combining Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Classifications, often mistakenly called directories, are pervasive: we use them to classify our messages, our favourite Web Pages, our files, ... And many more can be found in the Web; think for instance of the Google and Yahoo's directories. The problem is that all these classifications are very different or more precisely, semantically heterogeneous. The most striking consequence is that they classify documents very differently, making therefore very hard and sometimes impossible to find them. Matching classifications is the process which allows us to map those nodes of two classifications which, intuitively, correspond semantically to each other. In the first part of the talk I will show how it is possible to encode this problem into a propositional validity problem, thus allowing for the use of SAT reasoners. This is done mainly using linguistic resources (e.g., WordNet) and some amount of Natural Language Processing. However, as shown in the second part of the talk, this turns out to be almost useless. In most cases, in fact, linguistic resources do not contain enough of the axioms needed to prove unsatisfiability. The solution to this problem turns to be that of using SAT as a way to generate the missing axioms. We have started using linguistic resources to provide SAT with the axioms needed to match classifications, and we have ended up using SAT to generate missing axioms in the linguistic resources. We will argue that this is an example of a more general phenomenon which arises when using commonsense knowledge. This in turns becomes an opportunity for the use of decision procedures for a focused automated generation of the missing knowledge.