Foundations of statistical natural language processing
Foundations of statistical natural language processing
Modern Information Retrieval
Using part-of-speech patterns to reduce query ambiguity
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval of Short Documents from Discussion Forums
AI '02 Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence
Combining Topic Models and Social Networks for Chat Data Mining
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Examining the content load of part of speech blocks for information retrieval
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Why we twitter: understanding microblogging usage and communities
Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
Proceedings of the first workshop on Online social networks
Military Textual Analysis and Chat Research
ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
Part of Speech Based Term Weighting for Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
An online document clustering technique for short web contents
Pattern Recognition Letters
Introduction to Linguistic Annotation and Text Analytics
Introduction to Linguistic Annotation and Text Analytics
Similarity measures for short segments of text
ECIR'07 Proceedings of the 29th European conference on IR research
Mining police digital archives to link criminal styles with offender characteristics
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
ACM SIGIR Forum
Statistics of online user-generated short documents
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Chat mining for gender prediction
ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Improving retrieval of short texts through document expansion
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
The importance of the Internet as a communication medium is reflected in the large amount of documents being generated every day by users of the different services that take place online. In this work we aim at analyzing the properties of these online user-generated documents for some of the established services over the Internet (Kongregate, Twitter, Myspace and Slashdot) and comparing them with a consolidated collection of standard information retrieval documents (from the Wall Street Journal, Associated Press and Financial Times, as part of the TREC ad-hoc collection). We investigate features such as document similarity, term burstiness, emoticons and Part-Of-Speech analysis, highlighting the applicability and limits of traditional content analysis and indexing techniques used in information retrieval to the new online user-generated documents.