Character N-Gram Tokenization for European Language Text Retrieval
Information Retrieval
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Independence and commitment: assumptions for rapid training and execution of rule-based POS taggers
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Investigation and modeling of the structure of texting language
International Journal on Document Analysis and Recognition
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Annotating named entities in Twitter data with crowdsourcing
CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Lexical normalisation of short text messages: makn sens a #twitter
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Semantic enrichment of twitter posts for user profile construction on the social web
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
The ML-Model for Multi-layer Social Networks
ASONAM '11 Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining
DBpedia spotlight: shedding light on the web of documents
Proceedings of the 7th International Conference on Semantic Systems
Adding semantics to microblog posts
Proceedings of the fifth ACM international conference on Web search and data mining
Named entity recognition in tweets: an experimental study
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
LINDEN: linking named entities with knowledge base via semantic knowledge
Proceedings of the 21st international conference on World Wide Web
Proceedings of the 21st international conference on World Wide Web
Finding co-solvers on twitter, with a little help from linked data
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
An approach for named entity recognition in poorly structured data
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
langid.py: an off-the-shelf language identification tool
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Joint inference of named entity recognition and normalization for tweets
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Automatically constructing a normalisation dictionary for microblogs
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Microblog language identification: overcoming the limitations of short, unedited and idiomatic text
Language Resources and Evaluation
Towards context-aware search and analysis on social media data
Proceedings of the 16th International Conference on Extending Database Technology
The 24th ACM Conference on Hypertext and Social Media (HT2013): a personal review
ACM SIGWEB Newsletter
Hi-index | 0.00 |
Using semantic technologies for mining and intelligent information access to microblogs is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Semantic annotation of tweets is typically performed in a pipeline, comprising successive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). Consequently, errors are cumulative, and earlier-stage problems can severely reduce the performance of final stages. This paper presents a characterisation of genre-specific problems at each semantic annotation stage and the impact on subsequent stages. Critically, we evaluate impact on two high-level semantic annotation tasks: named entity detection and disambiguation. Our results demonstrate the importance of making approaches specific to the genre, and indicate a diminishing returns effect that reduces the effectiveness of complex text normalisation.