Self-supervised capturing of users' activities from weblogs

Authors:
The-Minh Nguyen;Takahiro Kawamura;Yasuyuki Tahara;Akihiko Ohsuga
Affiliations:
Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan.;Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan.;Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan.;Graduate School of Information Systems, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan
Venue:
International Journal of Intelligent Information and Database Systems
Year:
2012

Citing 22
Cited 1

Snowball: extracting relations from large plain-text collections

DL '00 Proceedings of the fifth ACM conference on Digital libraries
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Japanese morphological analyzer using word co-occurrence: JTAG

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Mining models of human activities from the web

Proceedings of the 13th international conference on World Wide Web
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Reality mining: sensing complex social systems

Personal and Ubiquitous Computing
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research
RFID Supplement for Mobile-Based Life Log System

SAINT-W '07 Proceedings of the 2007 International Symposium on Applications and the Internet Workshops
Discovering Association Rules on Experiences from Large-Scale Blog Entries

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Methods for domain-independent information extraction from the web: an experimental comparison

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Sensor-based understanding of daily life via large-scale use of common sense

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Temporal and information flow based event detection from social text streams

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Cross-domain activity recognition

Proceedings of the 11th international conference on Ubiquitous computing
Inferring long-term user properties based on users' location history

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Common sense based joint training of human activity recognizers

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Ubiquitous Computing: Smart Devices, Environments and Interactions

Ubiquitous Computing: Smart Devices, Environments and Interactions
User interests in social media sites: an exploration with micro-blogs

Proceedings of the 18th ACM conference on Information and knowledge management
Human activity mining using conditional radom fields and self-supervised learning

ACIIDS'10 Proceedings of the Second international conference on Intelligent information and database systems: Part I

Comment on Wang et al.'s anonymous multi-receiver ID-based encryption scheme and its improved schemes

International Journal of Intelligent Information and Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this paper is to describe a method to automatically extract all basic attributes namely actor, action, object, time and location which belong to an activity from Japanese weblogs. Sentences retrieved from weblogs are often diversified, complex, syntactically wrong, have emoticons and new words. There are some works that have tried to extract users' activities in sentences retrieved from web and weblogs. However, these works have several limitations, such as inability of extracting infrequent activities, high setup cost, limitation on the types of sentences that can be handled, necessary of preparing a list of object and action. To resolve these problems, we propose a novel approach that treats the activity extraction as a sequence labelling problem, and automatically makes its own training data. This approach can extract infrequent activities, and has advantages such as scalability, and unnecessary any hand-tagged data. Since it does not require to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes, with high recall.