Journal of the American Society for Information Science
Information retrieval on the web
ACM Computing Surveys (CSUR)
ACM SIGKDD Explorations Newsletter
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
OIL: An Ontology Infrastructure for the Semantic Web
IEEE Intelligent Systems
The Knowledge Model of Protégé-2000: Combining Interoperability and Flexibility
EKAW '00 Proceedings of the 12th European Workshop on Knowledge Acquisition, Modeling and Management
Relational learning techniques for natural language information extraction
Relational learning techniques for natural language information extraction
Extracting unstructured data from template generated web documents
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Exploring semantic groups through visual approaches
Journal of Biomedical Informatics - Special issue: Unified medical language system
Tapping the power of text mining
Communications of the ACM - Privacy and security in highly dynamic systems
CRYSTAL inducing a conceptual dictionary
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
The Medical Semantic Web: Opportunities and Issues
International Journal of Information Technology and Web Engineering
Hi-index | 0.00 |
The World Wide Web has become an important medium for the dissemination of information related to a wide range of topics. The majority of human information is becoming available on the web very rapidly. In the medical domain, the number of documents related to healthcare is already large and continues to grow at an exponential rate. Most information on the web is buried inside HTML documents which are designed for human consumption. Restructuring information automatically into machine understandable form and making it available to web search agents would bring the web to its full potential. In this work we have downloaded a set of 100 diabetes-related websites, over 12000 HTML files, which have been carefully analyzed. Our intention is first to learn the general structure of these websites which would increase the efficiency of information extraction and structuring. Every website has a purpose mainly providing services or products (or both). Our study resulted in the construction of an ontology covering a set of general services and products that these websites offer. The main goal of such ontology is to provide guidance in the process of extracting and structuring information. We incorporated the Unified Medical Language System (UMLS) Semantic Net which serves as an upper level ontology for medicine. We used the MetaMap Transfer (MMTx) API developed by the US National Library of Medicine (NLM) for mapping text into concepts from the UMLS Semantic Net. Pinpointing concepts in web pages provides an efficient way to determine the attributes and therefore facilitates more efficient extraction and restructuring of information. This paper describes the first part of our work and findings.