Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A Simple Bayesian Framework for Content-Based Image Retrieval
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Collective information extraction with relational Markov networks
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Exploiting domain structure for named entity recognition
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
"More like these": growing entity classes from seeds
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
EntityRank: searching entities directly and holistically
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Foundations and Trends in Databases
Iterative Set Expansion of Named Entities Using the Web
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
StatSnowball: a statistical approach to extracting entity relationships
Proceedings of the 18th international conference on World wide web
Named entity mining from click-through data using weakly supervised latent dirichlet allocation
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A study on similarity and relatedness using distributional and WordNet-based approaches
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Locating complex named entities in web text
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Distributional similarity vs. PU learning for entity set expansion
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Coupled bayesian sets algorithm for semi-supervised learning and information extraction
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Hi-index | 0.00 |
Opinion mining has been an active research area in recent years. The task is to extract opinions expressed on entities and their attributes. For example, the sentence, "I love the picture quality of Sony cameras," expresses a positive opinion on the picture quality attribute of Sony cameras. Sony is the entity. This paper focuses on mining entities (e.g., Sony). This is an important problem because without knowing the entity, the extracted opinion is of little use. The problem is similar to the classic named entity recognition problem. However, there is a major difference. In a typical opinion mining application, the user wants to find opinions on some competing entities, e.g., competing or relevant products. However, he/she often can only provide a few names as there are too many of them. The system has to find the rest from a corpus. This implies that the discovered entities must be of the same type/class. This is the set expansion problem. Classic methods for solving the problem are based on distributional similarity. However, we found this method is inaccurate. We then employ a learning-based method called Bayesian Sets. However, directly applying Bayesian Sets produces poor results. We then propose a more sophisticated way to use Bayesian Sets. This method, however, causes two major problems: entity ranking and feature sparseness. For entity ranking, we propose a re-ranking method to solve the problem. For feature sparseness, we propose two methods to re-weight features and to determine the quality of features. These methods help improve the mining results substantially. Additionally, like any learning algorithm, Bayesian Sets requires the user to engineer a set of features. We design some generic features based on part-of-speech tags of words for learning, which thus does not need to engineer features for each specific domain. Experimental results using 10 real-life datasets from diverse domains demonstrated the effectiveness of the proposed technique.