Extracting information networks from the blogosphere

Authors:
Yuval Merhav;Filipe Mesquita;Denilson Barbosa;Wai Gen Yee;Ophir Frieder
Affiliations:
Illinois Institute of Technology, Chicago, IL;University of Alberta;University of Alberta;Orbitz Worldwide, Chicago, IL;Georgetown University, Washington, D.C.
Venue:
ACM Transactions on the Web (TWEB)
Year:
2012

Citing 33
Cited 2

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Kernel methods for relation extraction

The Journal of Machine Learning Research
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)

Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)
Description of the UMass system as used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Data Mining and Knowledge Discovery Handbook

Data Mining and Knowledge Discovery Handbook
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Dependency tree kernels for relation extraction

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Exploring various knowledge in relation extraction

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Preemptive information extraction using unrestricted relation discovery

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Clustering for unsupervised relation identification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval

Introduction to Information Retrieval
StatSnowball: a statistical approach to extracting entity relationships

Proceedings of the 18th international conference on World wide web
A comparison of extrinsic clustering evaluation metrics based on formal constraints

Information Retrieval
Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Discovering relations between named entities from a large raw corpus using tree similarity-based clustering

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

A weighting scheme for open information extraction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop
Discovering relations using matrix factorization methods

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of automatically extracting information networks formed by recognizable entities as well as relations among them from social media sites. Our approach consists of using state-of-the-art natural language processing tools to identify entities and extract sentences that relate such entities, followed by using text-clustering algorithms to identify the relations within the information network. We propose a new term-weighting scheme that significantly improves on the state-of-the-art in the task of relation extraction, both when used in conjunction with the standard tf ċ idf scheme and also when used as a pruning filter. We describe an effective method for identifying benchmarks for open information extraction that relies on a curated online database that is comparable to the hand-crafted evaluation datasets in the literature. From this benchmark, we derive a much larger dataset which mimics realistic conditions for the task of open information extraction. We report on extensive experiments on both datasets, which not only shed light on the accuracy levels achieved by state-of-the-art open information extraction tools, but also on how to tune such tools for better results.