Deriving concept hierarchies from text
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
KnowItNow: fast, scalable information extraction from the web
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
International Journal on Document Analysis and Recognition
Automatic Taxonomy Extraction Using Google and Term Dependency
WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Unsupervised information extraction approach using graph mutual reinforcement
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning concept hierarchies from text corpora using formal concept analysis
Journal of Artificial Intelligence Research
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Principal components for automatic term hierarchy building
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Data-driven computational linguistics at FaMAF-UNC, Argentina
YIWCALA '10 Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Constructing reference sets from unstructured, ungrammatical text
Journal of Artificial Intelligence Research
Materializing multi-relational databases from the web using taxonomic queries
Proceedings of the fourth ACM international conference on Web search and data mining
Building a lightweight semantic model for unsupervised information extraction on short listings
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Previous work on information extraction from unstructured, ungrammatical text (e.g. classified ads) showed that exploiting a set of background knowledge, called a "reference set," greatly improves the precision and recall of the extractions. However, finding a source for this reference set is often difficult, if not impossible. Further, even if a source is found, it might not overlap well with the text for extraction. In this paper we present an approach to building the reference set directly from the text itself. Our approach eliminates the need to find the source for the reference set, and ensures better overlap between the text and reference set. Starting with a small amount of background knowledge, our technique constructs tuples representing the entities in the text to form a reference set. Our results show that our method outperforms manually constructed reference sets, since hand built reference sets may not overlap with the entities in the unstructured, ungrammatical text. We also ran experiments comparing our method to the supervised approach of Conditional Random Fields (CRFs) using simple, generic features. These results show our method achieves an improvement in F1-measure for 6/9 attributes and is competitive in performance on the others, and this is without training data.