Building a search engine for computer science course syllabi

  • Authors:
  • Nakul Rathod;Lillian Cassel

  • Affiliations:
  • Villanova University, Villanova, PA, USA;Villanova University, Villanova, PA, USA

  • Venue:
  • Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Syllabi are rich educational resources. However, finding Computer Science syllabi on a generic search engine does not work well. Towards our goal of building a syllabus collection we have trained various Machine Learning classifiers to recognize Computer Science syllabi from other web pages and the discipline that they represent (AI or SE for instance) among other things. We have crawled 50 Computer Science departments in the US and gathered 100,000 candidate pages. Our best classifiers are more than 90% accurate at identifying syllabi from real-world data. The syllabus repository we created is live for public use (at http://syllabus.sdakak.com) and contains more than 3000 syllabi that our classifiers filtered out from the crawl data. We present an analysis of the various feature selection methods and classifiers used.