SEED: a framework for extracting social events from press news

Authors:
Salvatore Orlando;Francesco Pizzolon;Gabriele Tolomei
Affiliations:
DAIS - Università Ca' Foscari, Venezia, Italy;DAIS - Università Ca' Foscari, Venezia, Italy;DAIS - Università Ca' Foscari, Venezia, Italy
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 20
Cited 0

Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Text classification using string kernels

The Journal of Machine Learning Research
Kernel methods for relation extraction

The Journal of Machine Learning Research
Bottom-up relational learning of pattern matching rules for information extraction

The Journal of Machine Learning Research
Named Entity recognition without gazetteers

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
GATE: an architecture for development of robust HLT applications

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning structured prediction models: a large margin approach

ICML '05 Proceedings of the 22nd international conference on Machine learning
A comparison of algorithms for maximum entropy parameter estimation

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations

ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
A shortest path dependency kernel for relation extraction

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Information Extraction

Foundations and Trends in Databases
Open information extraction for the web

Open information extraction for the web
Analysis of a probabilistic model of redundancy in unsupervised information extraction

Artificial Intelligence
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Everyday people are exchanging a huge amount of data through the Internet. Mostly, such data consist of unstructured texts, which often contain references to structured information (e.g., person names, contact records, etc.). In this work, we propose a novel solution to discover social events from actual press news edited by humans. Concretely, our method is divided in two steps, each one addressing a specific Information Extraction (IE) task: first, we use a technique to automatically recognize four classes of named-entities from press news: DATE, LOCATION, PLACE, and ARTIST. Furthermore, we detect social events by extracting ternary relations between such entities, also exploiting evidence from external sources (i.e., the Web). Finally, we evaluate both stages of our proposed solution on a real-world dataset. Experimental results highlight the quality of our first-step Named-Entity Recognition (NER) approach, which indeed performs consistently with state-of-the-art solutions. Eventually, we show how to precisely select true events from the list of all candidate events (i.e., all the ternary relations), which result from our second-step Relation Extraction (RE) method. Indeed, we discover that true social events can be detected if enough evidence of those is found in the result list of Web search engines.