Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
A Data Mining Algorithm for Generalized Web Prefetching
IEEE Transactions on Knowledge and Data Engineering
Kernel methods for relation extraction
The Journal of Machine Learning Research
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Learning surface text patterns for a Question Answering system
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
High-Performance Unsupervised Relation Extraction from Large Corpora
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Discovering relations among named entities from large corpora
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
A semantic approach to IE pattern induction
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extracting relations with integrated information using kernel methods
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Clustering for unsupervised relation identification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Extracting Semantic Networks from Text Via Relational Clustering
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Mining comparative sentences and relations
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Confidence estimation for information extraction
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Improving semi-supervised acquisition of relation extraction patterns
IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Enterprise information extraction: recent developments and open challenges
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Non-expert correction of automatically generated relation annotations
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Domain adaptation of rule-based annotators for named-entity recognition tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
The SystemT IDE: an integrated development environment for information extraction rules
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
LODifier: generating linked data from unstructured text
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
WizIE: a best practices guided development environment for information extraction
ACL '12 Proceedings of the ACL 2012 System Demonstrations
I can do text analytics!: designing development tools for novice developers
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Hi-index | 0.00 |
Hand-crafted textual patterns have been the mainstay device of practical relation extraction for decades. However, there has been little work on reducing the manual effort involved in the discovery of effective textual patterns for relation extraction. In this paper, we propose a clustering-based approach to facilitate the pattern discovery for relation extraction. Specifically, we define the notion of semantic signature to represent the most salient features of a textual fragment. We then propose a novel clustering algorithm based on semantic signature, S2C, and its enhancement S2C+. Experiments on two real-world data sets show that, when compared with k-means clustering, S2C and S2C+ are at least an order of magnitude faster, while generating high quality clusters that are at least comparable to the best clusters generated by k-means without requiring any manual tuning. Finally, a user study confirms that our clustering-based approach can indeed help users discover effective textual patterns for relation extraction with only a fraction of the manual effort required by the conventional approach.