Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
How to build a WebFountain: An architecture for very large-scale text analytics
IBM Systems Journal
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Extraction of regulatory gene/protein networks from Medline
Bioinformatics
Bioinformatics
EntityRank: searching entities directly and holistically
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient Information Extraction over Evolving Text Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A framework for schema-driven relationship discovery from unstructured text
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Extracting Protein Interactions from Text with the Unified AkaneRE Event Extraction System
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient Extraction of Protein-Protein Interactions from Full-Text Articles
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Enabling information extraction by inference of regular expressions from sample entities
Proceedings of the 20th ACM international conference on Information and knowledge management
Regular path queries on large graphs
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
A wealth of information is available only in web pages, patents, publications etc. Extracting information from such sources is challenging, both due to the typically complex language processing steps required and to the potentially large number of texts that need to be analyzed. Furthermore, integrating extracted data with other sources of knowledge often is mandatory for subsequent analysis. In this demo, we present the AliBaba system for scalable information extraction from biomedical documents. Unlike many other systems, AliBaba performs both entity extraction and relationship extraction and graphically visualizes the resulting network of inter-connected objects. It leverages the PubMed search engine for selection of relevant documents. The technical novelty of AliBaba is twofold: (a) its ability to automatically learn language patterns for relationship extraction without an annotated corpus, and (b) its high performance pattern matching algorithm. We show that a simple yet effective pattern filtering technique improves the runtime of the system drastically without harming its extraction effectiveness. Although AliBaba has been implemented for biomedical texts, its underlying principles should also be applicable in any other domain.