Comparative study of classification techniques on biomedical data from hypertext documents

Authors:
Rashedur M. Rahman;Sazia Salahuddin
Affiliations:
Department of Electrical Engineering and Computer Science, North South University, Plot-15, Block-B, Bashundhara, Dhaka 1229, Bangladesh;Department of Electrical Engineering and Computer Science, North South University, Plot-15, Block-B, Bashundhara, Dhaka 1229, Bangladesh
Venue:
International Journal of Knowledge Engineering and Soft Data Paradigms
Year:
2013

Citing 15
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Data mining: concepts and techniques

Data mining: concepts and techniques
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Modern Information Retrieval

Modern Information Retrieval
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Induction of Decision Trees

Machine Learning
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Classification by Voting Feature Intervals

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Information extraction from biomedical text

Journal of Biomedical Informatics - Special issue: Sublanguage
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Graph-Theoretic Techniques for Web Content Mining

Graph-Theoretic Techniques for Web Content Mining
Web content outlier mining: motivation, framework, and algorithms

Web content outlier mining: motivation, framework, and algorithms
Semantics-aware matching strategy (SAMS) for the Ontology meDiated Data Integration (ODDI)

International Journal of Knowledge Engineering and Soft Data Paradigms

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, our goal is to mine biomedical data from hypertext documents e.g., mining data from web contents using data mining algorithms with the help of 'biomedical ontology'. We collect a number of documents using Google and preprocess the hypertext documents and extract the text data. Next job is the identification of biomedical data. To identify whether a word is a biomedical entity or not we use a biomedical database, the 'UMLS metathesaurus'. The mapping of biomedical entity from the metathesaurus will be done based on keyword query. The more occurrence of a biomedical entity in a page, the more relevant the page is, and thus, we can re-rank the documents to find the most important documents. Then we test and analyse the performance of seven most popular classification algorithms by training them separately with the documents ranked by Google and our algorithm.