Mining officially unrecognized side effects of drugs by combining web search and machine learning

Authors:
Carlo A. Curino;Yuanyuan Jia;Bruce Lambert;Patricia M. West;Clement Yu
Affiliations:
Politecnico di Milano, Milano, Italy;UIC, Chicago, IL;UIC, Chicago, IL;UIC, Chicago, IL;UIC, Chicago, IL
Venue:
Proceedings of the 14th ACM international conference on Information and knowledge management
Year:
2005

Citing 11
Cited 0

Two learning schemes in information retrieval

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text processing

Automatic text processing
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Data mining: concepts and techniques

Data mining: concepts and techniques
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An effective approach to document retrieval via utilizing WordNet and recognizing phrases

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of finding officially unrecognized side effects of drugs. By submitting queries to the Web involving a given drug name, it is possible to retrieve pages concerning the drug. However, many retrieved pages are irrelevant and some relevant pages are not retrieved. More relevant pages can be obtained by adding the active ingredient of the drug to the query. In order to eliminate irrelevant pages, we propose a machine learning process to filter out the undesirable pages. The process is shown experimentally to be very effective. Since obtaining training data for the machine learning process can be time consuming and expensive, we provide an automatic method to generate the training data. The method is also shown to be very accurate. The side effects of three drugs which are not recognized by FDA are validated by an expert. We believe that the same approach can be applied to many real life problems and will yield high precision. Thus, this could lead a new way to perform retrieval with high accuracy.