Modern Information Retrieval
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning and Problem Solving with Multilayer Connectionist Systems
Learning and Problem Solving with Multilayer Connectionist Systems
Automatic Identification of Home Pages on the Web
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Some Effective Techniques for Naive Bayes Text Classification
IEEE Transactions on Knowledge and Data Engineering
Towards a syllabus repository for computer science courses
Proceedings of the 38th SIGCSE technical symposium on Computer science education
Natural Language Processing with Python
Natural Language Processing with Python
Solr 1.4 Enterprise Search Server
Solr 1.4 Enterprise Search Server
Machine learning in building a collection of computer science course syllabi
TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Hi-index | 0.00 |
Syllabi are rich educational resources. However, finding Computer Science syllabi on a generic search engine does not work well. Towards our goal of building a syllabus collection we have trained various Machine Learning classifiers to recognize Computer Science syllabi from other web pages and the discipline that they represent (AI or SE for instance) among other things. We have crawled 50 Computer Science departments in the US and gathered 100,000 candidate pages. Our best classifiers are more than 90% accurate at identifying syllabi from real-world data. The syllabus repository we created is live for public use (at http://syllabus.sdakak.com) and contains more than 3000 syllabi that our classifiers filtered out from the crawl data. We present an analysis of the various feature selection methods and classifiers used.