Using Google distance to weight approximate ontology matches
Proceedings of the 16th international conference on World Wide Web
Mining the Web Through Verbs: A Case Study
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Web-scale named entity recognition
Proceedings of the 17th ACM conference on Information and knowledge management
Foundations and Trends in Databases
A quality-aware optimizer for information extraction
ACM Transactions on Database Systems (TODS)
Data extraction from the web using wild card queries
Proceedings of the 18th ACM conference on Information and knowledge management
Extracting multi-dimensional relations: a generative model of groups of entities in a corpus
Proceedings of the 20th ACM international conference on Information and knowledge management
An approach to automatic music band member detection based on supervised learning
AMR'11 Proceedings of the 9th international conference on Adaptive Multimedia Retrieval: large-scale multimedia retrieval and evaluation
Hi-index | 0.00 |
A wealth of information is hidden within unstructured text. Often, this information can be beat exploited in structured or relational form, which is well suited for sophisticated query processing, for integration with relational database management systems, and for data mining. This thesis addresses two fundamental problems in extracting relations from large text collections: (1) portability: tuning extraction systems for new domains and (2) scalability: scaling up information extraction to large collections of documents. To address the first problem, we developed the Snowball information extraction system, a domain-independent system that learns to extract relations from unstructured text based on only a handful of user-provided example relation instances. Snowball can then be adapted to extract new relations with minimum human effort. Snowball improves the extraction accuracy by automatically evaluating the quality of both the acquired extraction patterns and the extracted relation instances. To address the second problem, we developed the QXtract system, which learns search engine queries that retrieve the documents that are relevant to a given information extraction system and extraction task. QXtract can dramatically improve the efficiency of the information extraction process, and provides a building block for extracting structured information and text data mining from the web at large.