Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Introduction to Information Retrieval
Introduction to Information Retrieval
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Weakly-supervised acquisition of labeled class instances using graph random walks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
TextRunner: open information extraction on the web
NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Automatic set instance extraction using the web
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Character-level analysis of semi-structured documents for set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Coupled semi-supervised learning for information extraction
Proceedings of the third ACM international conference on Web search and data mining
A semi-supervised method to learn and construct taxonomies using the web
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Probabilistic Metrics for Soft-Clustering and Topic Model Validation
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Towards the web of concepts: extracting concepts from large datasets
Proceedings of the VLDB Endowment
Annotating and searching web tables using entities, types and relationships
Proceedings of the VLDB Endowment
Joint training for open-domain extraction on the web: exploiting overlap when supervision is limited
Proceedings of the fourth ACM international conference on Web search and data mining
Collectively representing semi-structured data from the web
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Learning to find comparable entities on the web
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Assessing sparse information extraction using semantic contexts
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
WebChild: harvesting and organizing commonsense knowledge from the web
Proceedings of the 7th ACM international conference on Web search and data mining
Acquisition of open-domain classes via intersective semantics
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and concept-instance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs.