Identifying interesting assertions from the web

Authors:
Thomas Lin;Oren Etzioni;James Fogarty
Affiliations:
University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 9
Cited 8

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
C4.5: programs for machine learning

C4.5: programs for machine learning
Analyzing the Subjective Interestingness of Association Rules

IEEE Intelligent Systems
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A probabilistic model of redundancy in information extraction

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference

Machine reading at the University of Washington

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
PRISMATIC: inducing knowledge from a large scale lexicalized relation resource

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Chart pruning for fast lexicalised-grammar parsing

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Ontologizing concept maps using graph theory

Proceedings of the 2011 ACM Symposium on Applied Computing
Towards open ontology learning and filtering

Information Systems
Unsupervised lexicon acquisition for HPSG-based relation extraction

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Automatic knowledge extraction from documents

IBM Journal of Research and Development
Are Some Tweets More Interesting Than Others? #HardQuestion

Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

How can we cull the facts we need from the overwhelming mass of information and misinformation that is the Web? The TextRunner extraction engine represents one approach, in which people pose keyword queries or simple questions and TextRunner returns concise answers based on tuples extracted from Web text. Unfortunately, the results returned by engines such as TextRunner include both informative facts (e.g., the FDA banned ephedra) and less useful statements (e.g., the FDA banned products). This paper therefore investigates filtering TextRunner results to enable people to better focus on interesting assertions. We first develop three distinct models of what assertions are likely to be interesting in response to a query. We then fully operationalize each of these models as a filter over TextRunner results. Finally, we develop a more sophisticated filter that combines the different models using relevance feedback. In a study of human ratings of the interestingness of TextRunner assertions, we show that our approach substantially enhances the quality of TextRunner results. Our best filter raises the fraction of interesting results in the top thirty from 41.6% to 64.1%.