Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Intelligent crawling on the World Wide Web with arbitrary predicates
Proceedings of the 10th international conference on World Wide Web
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Adaptive on-line page importance computation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic document metadata extraction using support vector machines
Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
A Conceptual Framework for Web-Based Intelligent Learning Environments Using SCORM-2004
ICALT '04 Proceedings of the IEEE International Conference on Advanced Learning Technologies
Focused crawling by exploiting anchor text using decision tree
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Metadata extraction from bibliographies using bigram HMM
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
China science grid: e-science activity support
CHINA HPC '07 Proceedings of the 2007 Asian technology information program's (ATIP's) 3rd workshop on High performance computing in China: solution approaches to impediments for high performance computing
Hi-index | 0.00 |
In order to help users access on-line materials with more specific questions, we build a learning portal named Fusion. First we develop Fusion-Crawler, a link classification focused crawler, to download potential course pages. We then use a binary classifier to pick out the course pages. After the course pages are identified, we use FusionExtractor, a DOM tree based regular expression wrapper, to extract metadata. The metadata include Course Name, Instructor Information, Course Outline, and other relevant information, and they are stored in a database behind the portal. Experimental results show that our approach to organize on-line courses based on focused crawling and metadata extraction approach is effective. The FusionCrawler got average 40-50% more on-topic learning materials than normal focused crawler, while the average F1 in FusionExtractor is 85%. With metadata of more than 1,400 MIT OCW, 3000 UIUC and 1000 WISC courses; 300 courses from GreatLearning with 3000 Chinese course videos; and nearly 1000 videos from Internet Achieve; the Fusion portal provides several kinds of searching function, like quick search, advanced search and semantic navigation browsing.