Anticipating annotations and emerging trends in biomedical literature

Authors:
Fabian Mörchen;Mathäus Dejori;Dmitriy Fradkin;Julien Etienne;Bernd Wachmann;Markus Bundschus
Affiliations:
Siemens Corporate Research, Princeton, NJ, USA;Siemens Corporate Research, Princeton, NJ, USA;Siemens Corporate Research, Princeton, NJ, USA;Siemens Corporate Research, Princeton, NJ, USA;Siemens Corporate Research, Princeton, NJ, USA;Ludwig-Maximilians-University, Munich, Germany
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 16
Cited 9

An algorithm for suffix stripping

Readings in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Automatic generation of overview timelines

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Latent dirichlet allocation

The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Tracking dynamics of topic trends using a finite mixture model

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Parameter free bursty events detection in text streams

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical entity-topic models

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Discovering emerging topics in unlabelled text collections

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems

Detecting topic evolution in scientific literature: how can citations help?

Proceedings of the 18th ACM conference on Information and knowledge management
Tagging stream data for rich real-time services

Proceedings of the VLDB Endowment
Using Topic Models to Interpret MEDLINE's Medical Subject Headings

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Learning to annotate scientific publications

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Discovering market trends in the biotechnology industry

International Journal of Business Intelligence and Data Mining
Co-word analysis of the trends in stem cells field based on subject heading weighting

Scientometrics
IPKB: a digital library for invertebrate paleontology

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Topic segmentation model based on ATNLDA and co-occurrence theory and its application in stem cell field

Journal of Information Science
Ontologies and terminologies: Continuum or dichotomy?

Applied Ontology - Ontologies and Terminologies: Continuum or Dichotomy?

Quantified Score

Hi-index	0.00

Visualization

Abstract

The BioJournalMonitor is a decision support system for the analysis of trends and topics in the biomedical literature. Its main goal is to identify potential diagnostic and therapeutic biomarkers for specific diseases. Several data sources are continuously integrated to provide the user with up-to-date information on current research in this field. State-of-the-art text mining technologies are deployed to provide added value on top of the original content, including named entity detection, relation extraction, classification, clustering, ranking, summarization, and visualization. We present two novel technologies that are related to the analysis of temporal dynamics of text archives and associated ontologies. Currently, the MeSH ontology is used to annotate the scientific articles entering the PubMed database with medical terms. Both the maintenance of the ontology as well as the annotation of new articles is performed largely manually. We describe how probabilistic topic models can be used to annotate recent articles with the most likely MeSH terms. This provides our users with a competitive advantage because, when searching for MeSH terms, articles are found long before they are manually annotated. We further present a study on how to predict the inclusion of new terms in the MeSH ontology. The results suggest that early prediction of emerging trends is possible. The trend ranking functions are deployed in our system to enable interactive searches for the hottest new trends relating to a disease.