An algorithm for local geoparsing of microtext

Authors:
Judith Gelernter;Shilpa Balaji
Affiliations:
Language Technologies Institute, #6416, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA 15213;Language Technologies Institute, #6416, School of Computer Science, Carnegie Mellon University, Pittsburgh, USA 15213
Venue:
Geoinformatica
Year:
2013

Citing 32
Cited 2

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Geo-word centric association rule mining

Proceedings of the 6th international conference on Mobile data management
A term recognition approach to acronym recognition

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Identifying location in indonesian documents for geographic information retrieval

Proceedings of the 4th ACM workshop on Geographical information retrieval
A differential notion of place for local search

Proceedings of the first international workshop on Location and the web
AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools

Proceedings of the 2008 international working conference on Mining software repositories
On metonymy recognition for geographic information retrieval

International Journal of Geographical Information Science
A discriminative alignment model for abbreviation recognition

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic acronym recognition

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Chatter on the red: what hazards threat reveals about the social life of microblogged information

Proceedings of the 2010 ACM conference on Computer supported cooperative work
Microblogging during two natural hazards events: what twitter may contribute to situational awareness

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
AcroDef: a quality measure for discriminating expansions of ambiguous acronyms

CONTEXT'07 Proceedings of the 6th international and interdisciplinary conference on Modeling and using context
A latent variable model for geographic lexical variation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
You are where you tweet: a content-based approach to geo-locating twitter users

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Semantic twitter: analyzing tweets for real-time event notification

BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
Toponym resolution in social media

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
ICE-TEA: in-context expansion and translation of English abbreviations

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Recognizing named entities in tweets

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Simple supervised document geolocation with geodesic grids

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Multifaceted toponym recognition for streaming news

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Using second-order vectors in a knowledge-based method for acronym disambiguation

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Expansion finding for given acronyms using conditional random fields

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Towards Named Entity Recognition Method for Microtexts in Online Social Networks: A Case Study of Twitter

ASONAM '11 Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining
Acronym Expansion Via Hidden Markov Models

ICSENG '11 Proceedings of the 2011 21st International Conference on Systems Engineering
Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs

Proceedings of the 20th ACM international conference on Information and knowledge management
"I'm eating a sandwich in Glasgow": modeling locations with tweets

Proceedings of the 3rd international workshop on Search and mining user-generated contents
Processing and visualizing the data in tweets

ACM SIGMOD Record
Going Beyond Citizen Data Collection with Mapster: A Mobile+Cloud Real-Time Citizen Science Experiment

ESCIENCEW '11 Proceedings of the 2011 IEEE Seventh International Conference on e-Science Workshops
Using syntactic and semantic structural kernels for classifying definition questions in Jeopardy!

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A supervised learning approach to acronym identification

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence

Construction of a Japanese gazetteers for Japanese local toponym disambiguation

Proceedings of the 7th Workshop on Geographic Information Retrieval
Cross-lingual geo-parsing for non-structured data

Proceedings of the 7th Workshop on Geographic Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The location of the author of a social media message is not invariably the same as the location that the author writes about in the message. In applications that mine these messages for information such as tracking news, political events or responding to disasters, it is the geographic content of the message rather than the location of the author that is important. To this end, we present a method to geo-parse the short, informal messages known as microtext. Our preliminary investigation has shown that many microtext messages contain place references that are abbreviated, misspelled, or highly localized. These references are missed by standard geo-parsers. Our geo-parser is built to find such references. It uses Natural Language Processing methods to identify references to streets and addresses, buildings and urban spaces, and toponyms, and place acronyms and abbreviations. It combines heuristics, open-source Named Entity Recognition software, and machine learning techniques. Our primary data consisted of Twitter messages sent immediately following the February 2011 earthquake in Christchurch, New Zealand. The algorithm identified location in the data sample, Twitter messages, giving an F statistic of 0.85 for streets, 0.86 for buildings, 0.96 for toponyms, and 0.88 for place abbreviations, with a combined average F of 0.90 for identifying places. The same data run through a geo-parsing standard, Yahoo! Placemaker, yielded an F statistic of zero for streets and buildings (because Placemaker is designed to find neither streets nor buildings), and an F of 0.67 for toponyms.