Multi-instance learning by treating instances as non-I.I.D. samples
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Classifying latent user attributes in twitter
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Democrats, republicans and starbucks afficionados: user classification in twitter
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
User attribute is an essential factor for personalized recommendation and targeted advertising. Therefore, there have been a number of studies to identify user attributes automatically from SNS postings, since the postings reveal various attributes of writers. Many kinds of machine learning methods have been applied to automatic identification of user attributes as a candidate solution, but they suffer from two major problems. First, there are many postings in SNS that do not deliver any information about writers. Then, learning from SNS postings results in a biased model by these irrelevant postings. Second, the postings of a SNS user are somewhat related one another. However, most machine learning methods ignore this information, since they assume that data are independently and identically distributed. In order to solve these problems in user attribute identification, this paper proposes a novel method based on non-i.i.d. multi-instance learning. Since multi-instance learning treats all postings by a user as a bag and learns user attribute identification with such bags, not with postings, the first problem is solved. In addition, the proposed method assumes that the postings by a single user have a structure. By incorporating this assumption into the multi-instance learning, the second problem is solved. Our experimental results show that consideration of these two problems in automatic user attribute identification results in performance improvement.