OfCourse: web content discovery, classification and information extraction for online course materials

Authors:
Yuhong Xiong;Ping Luo;Yong Zhao;Fen Lin;Shicong Feng;Baoyao Zhou;Liwei Zheng
Affiliations:
Hewlett Packard Labs China, Beijing, China;Hewlett Packard Labs China, Beijing, China;Hewlett Packard Labs China, Beijing, China;Institue of Computing Technology, CAS, Beijing, China;Hewlett Packard Labs China, Beijing, China;Hewlett Packard Labs China, Beijing, China;Hewlett Packard Labs China, Beijing, China
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 2
Cited 2

Information Extraction: Distilling Structured Data from Unstructured Text

Queue - Social Computing
Towards combining web classification and web information extraction: a case study

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Crawling the web for structured documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Confidence-Based incremental classification for objects with limited attributes in vertical search

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present OfCourse, a vertical search engine for online course materials. These materials have the following characteristics: they are scattered very sparsely in the university Web sites; and are generated by the teachers with totally different HMTL templates and layouts. These characteristics impose some challenges for Web Classification (to identify the course materials) and Web Information Extraction (to extract course metadata, such as course title, time and ID) from the identified course homepages. Here, we describe our proposed method to tackle these challenges, and the features of this system. OfCourse, containing over 60,000 courses from the top 50 universities in the US, is currently available for public access at http://fusion.hpl.hp.com/OfCourse/.