A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Extracting relational data from HTML repositories
ACM SIGKDD Explorations Newsletter
Mining knowledge from text using information extraction
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Using text mining and natural language processing for health care claims processing
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Learning Object Models from Semistructured Web Documents
IEEE Transactions on Knowledge and Data Engineering
Building ontologies for interoperability among learning objects and learners
IEA/AIE'2004 Proceedings of the 17th international conference on Innovations in applied artificial intelligence
An extensible text extraction tool for learning objects
ICEC '06 Proceedings of the 8th international conference on Electronic commerce: The new e-commerce: innovations for conquering current barriers, obstacles and limitations to conducting successful business on the internet
An adaptive scheduling system with genetic algorithms for arranging employee training programs
Expert Systems with Applications: An International Journal
The use of ontologies and rules to assist in academic advising
RuleML'07 Proceedings of the 2007 international conference on Advances in rule interchange and applications
Toward agency and ontology for web-based information retrieval
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Discovering golden nuggets: data mining in financial application
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
International Journal of Business Information Systems
Matching semi-structured documents using similarity of regions through fuzzy rule-based system
ICDM'13 Proceedings of the 13th international conference on Advances in Data Mining: applications and theoretical aspects
Hi-index | 12.05 |
Creating an academic e-Advisor to automate the process of transferring course credits between institutions and recommend courses for further study requires an extensive database of course information. This paper presents an application for creating such a database by automatically extracting relevant information from HTML course outlines stored on an institution's website and storing it in machine-readable XML. The developed application, called CODE (course outline data extractor), parses a course outline based on its HTML tags and content to build a document object model then applies a combination of web mining, natural language processing, and pattern recognition techniques to automatically classify and extract content useful for the semi-automatic e-Advisor and store it as XML. The current implementation is restricted to HTML course outlines, but the concepts can be extended to other formats of learning objects or entirely different domains. The quality of extraction and classification is evaluated for a corpus of syllabi as proof of concept.