Information extraction from syllabi for academic e-Advising

  • Authors:
  • Yevgen Biletskiy;J. Anthony Brown;Girish Ranganathan

  • Affiliations:
  • University of New Brunswick, UNB, Electrical and Computer Engineering, 15 Dineen Drive, Fredericton, New Brunswick, Canada E3B5A3;University of New Brunswick, UNB, Electrical and Computer Engineering, 15 Dineen Drive, Fredericton, New Brunswick, Canada E3B5A3;University of New Brunswick, UNB, Electrical and Computer Engineering, 15 Dineen Drive, Fredericton, New Brunswick, Canada E3B5A3

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 12.05

Visualization

Abstract

Creating an academic e-Advisor to automate the process of transferring course credits between institutions and recommend courses for further study requires an extensive database of course information. This paper presents an application for creating such a database by automatically extracting relevant information from HTML course outlines stored on an institution's website and storing it in machine-readable XML. The developed application, called CODE (course outline data extractor), parses a course outline based on its HTML tags and content to build a document object model then applies a combination of web mining, natural language processing, and pattern recognition techniques to automatically classify and extract content useful for the semi-automatic e-Advisor and store it as XML. The current implementation is restricted to HTML course outlines, but the concepts can be extended to other formats of learning objects or entirely different domains. The quality of extraction and classification is evaluated for a corpus of syllabi as proof of concept.