Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Scaling question answering to the Web
Proceedings of the 10th international conference on World Wide Web
Exploiting redundancy in question answering
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the MSE robustness of batching estimators
Proceedings of the 33nd conference on Winter simulation
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Learning surface text patterns for a Question Answering system
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Preemptive information extraction using unrestricted relation discovery
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
On-demand information extraction
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Truth discovery with multiple conflicting information providers on the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Methods for domain-independent information extraction from the web: an experimental comparison
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Data-oriented content query system: searching for data into text on the web
Proceedings of the third ACM international conference on Web search and data mining
Towards rich query interpretation: walking back and forth for mining query templates
Proceedings of the 19th international conference on World wide web
Tuple refinement method based on relationship keyword extension
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Methods for exploring and mining tables on Wikipedia
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
When speed has a price: fast information extraction using approximate algorithms
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
While tuple extraction for a given relation has been an active research area, its dual problem of pattern search-- to find and rank patterns in a principled way-- has not been studied explicitly. In this paper, we propose and address the problem of pattern search, in addition to tuple extraction. As our objectives, we stress reusability for pattern search and scalability of tuple extraction, such that our approach can be applied to very large corpora like the Web. As the key foundation, we propose a conceptual model PRDualRank to capture the notion of precision and recall for both tuples and patterns in a principled way, leading to the "rediscovery" of the Pattern-Relation Duality-- the formal quantification of the reinforcement between patterns and tuples with the metrics of precision and recall. We also develop a concrete framework for PRDualRank, guided by the principles of a perfect sampling process over a complete corpus. Finally, we evaluated our framework over the real Web. Experiments show that on all three target relations our principled approach greatly outperforms the previous state-of-the-art system in both effectiveness and efficiency. In particular, we improved optimal F-score by up to 64%.