Towards automatic syllabi matching

  • Authors:
  • Marco Ronchetti;Joseph Sant

  • Affiliations:
  • Universita di Trento, Povo, Italy;Sheridan Institute of Technology and Applied Learning, Oakville, ON, Canada

  • Venue:
  • ITiCSE '09 Proceedings of the 14th annual ACM SIGCSE conference on Innovation and technology in computer science education
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Student mobility is a priority in the European Union since it not only allows academic interchange but also fosters the awareness of being a European citizen amongst students. The Bologna Process aimed at homogenizing the structure of the European Universities to facilitate the recognition of academic titles as foreseen by the Lisbon Recognition Convention and student mobility during their matriculation. Over one and a half million students have already benefited from mobility programs such as the Erasmus programme. Students that participate in a mobility program must consider a destination, a selection of courses to follow abroad and how their home institution will recognize their foreign credits. Selecting the most appropriate courses is not a simple task since a course title doesn't always reflect its content. As a result, manual inspection of syllabi is necessary. This makes the task time-consuming since it might require manual inspection and comparison of many syllabi from different institutions. It would be nice to be able to at least partially automate the process -- i.e. given a set of syllabi from two different universities, to be able to automatically find the best match among courses in the two institutions. We started experimenting with this possibility, and although we do not yet have final results we will present the main idea of our project. Our plan is to try to apply similarity matching algorithms to available documents. Similarity matching is often based on co-occurrence of common words. However, a naïve application of such an algorithm would probably end up generating spurious similarities from the co-occurrence of general terms like "hour, exercise, exam...". Using a stop-word strategy in which these words are catalogued and ignored might seem a viable solution, but generally does not significantly improve the results: words that may be considered irrelevant in one context might be important in a different context. The path we are following is to assume the existence of a reference ontology, where all terms have a description, and then try to identify the occurrence of the concepts existing in the ontology within the examined documents. In this way we will be able to state that "syllabus x deals with topic y". The matching between different syllabi would then be calculated by matching the topics that were associated with the syllabi. We decided to focus on the Computer Science domain since the domain has already been classified into areas, units and topics present in CC2001[1] and this ontology has already been mapped into XML structures[2]. We then used a similarity matching algorithm that uses Wikipedia as a reference corpus[3]. Although preliminary results are not yet fully satisfactory, we believe that this might result from working at the word level rather than at a concept level; "software engineering" is not just the co-occurrence of "software" and "engineering" but a more complex concept. We are therefore currently exploring the possibility of identifying multi-words as concepts (still by using Wikipedia as a reference to decide if this is the case or not). If our attempts are successful, the next step will be to (semi-)automatically crawl academic sites to identify curricula and automatically match them by using our algorithm.