On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
A maximum entropy approach to natural language processing
Computational Linguistics
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Concept decompositions for large sparse text data using clustering
Machine Learning
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text categorization by boosting automatically extracted concepts
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Distributional word clusters vs. words for text categorization
The Journal of Machine Learning Research
Latent semantic models for collaborative filtering
ACM Transactions on Information Systems (TOIS)
Proceedings of the 13th international conference on World Wide Web
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Using the web to overcome data sparseness
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
ACM SIGIR Forum
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring semantic similarity between words using web search engines
Proceedings of the 16th international conference on World Wide Web
Identifying Document Topics Using the Wikipedia Category Network
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Clustering short texts using wikipedia
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving similarity measures for short segments of text
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Similarity measures for short segments of text
ECIR'07 Proceedings of the 29th European conference on IR research
Expectation-propagation for the generative aspect model
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Matching and Ranking with Hidden Topics towards Online Contextual Advertising
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Web Search Clustering and Labeling with Hidden Topics
ACM Transactions on Asian Language Information Processing (TALIP)
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Exploiting internal and external semantics for the clustering of short texts using world knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Framework for timely and accurate ads on mobile devices
Proceedings of the 18th ACM conference on Information and knowledge management
Cross-cultural analysis of blogs and forums with mixed-collection topic models
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Web opinions analysis with scalable distance-basedclustering
ISI'09 Proceedings of the 2009 IEEE international conference on Intelligence and security informatics
Short text classification in twitter to improve information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
LDA for on-the-fly auto tagging
Proceedings of the fourth ACM conference on Recommender systems
Collaboration analytics: mining work patterns from collaboration activities
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Cross lingual text classification by mining multilingual topics from wikipedia
Proceedings of the fourth ACM international conference on Web search and data mining
Empirical study of topic modeling in Twitter
Proceedings of the First Workshop on Social Media Analytics
Query by document via a decomposition-based two-level retrieval approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Entity disambiguation with hierarchical topic models
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering context: classifying tweets through a semantic transform based on wikipedia
FAC'11 Proceedings of the 6th international conference on Foundations of augmented cognition: directing the future of adaptive systems
Text classification for data loss prevention
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Transferring topical knowledge from auxiliary long texts for short text clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge
Proceedings of the 20th ACM international conference on Information and knowledge management
Unsupervised concept annotation using latent Dirichlet allocation and segmental methods
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Topic discovery and topic-driven clustering for audit method datasets
ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Document hierarchies from text and links
Proceedings of the 21st international conference on World Wide Web
Enhancing naive bayes with various smoothing methods for short text classification
Proceedings of the 21st international conference companion on World Wide Web
An approach of semi-automatic public sentiment analysis for opinion and district
WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
Representation models for text classification: a comparative analysis over three web document types
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Classification of short texts by deploying topical annotations
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
On minimum distribution discrepancy support vector machine for domain adaptation
Pattern Recognition
Short text classification improved by learning multi-granularity topics
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Short text conceptualization using a probabilistic knowledgebase
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A wikipedia based semantic graph model for topic tracking in blogosphere
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Source-selection-free transfer learning
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A novel approach for clustering sentiments in Chinese blogs based on graph similarity
Computers & Mathematics with Applications
Short text classification using very few words
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Identifying comparable corpora using LDA
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Topic classification of blog posts using distant supervision
Proceedings of the Workshop on Semantic Analysis in Social Media
Making your interests follow you on twitter
Proceedings of the 21st ACM international conference on Information and knowledge management
Topic-driven reader comments summarization
Proceedings of the 21st ACM international conference on Information and knowledge management
TCSST: transfer classification of short & sparse text using external data
Proceedings of the 21st ACM international conference on Information and knowledge management
Using semi-structured data for assessing research paper similarity
Information Sciences: an International Journal
Graph-based collective classification for tweets
Proceedings of the 21st ACM international conference on Information and knowledge management
Extended information inference model for unsupervised categorization of web short texts
Journal of Information Science
Wiki3C: exploiting wikipedia for context-aware concept categorization
Proceedings of the sixth ACM international conference on Web search and data mining
visualRSS: a platform to mine and visualise social data from RSS feeds
ICWE'12 Proceedings of the 12th international conference on Current Trends in Web Engineering
Distributional term representations for short-text categorization
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Harnessing linked knowledge sources for topic classification in social media
Proceedings of the 24th ACM Conference on Hypertext and Social Media
Enhancing short text clustering with small external repositories
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
NIFTY: a system for large scale information flow tracking and clustering
Proceedings of the 22nd international conference on World Wide Web
A biterm topic model for short texts
Proceedings of the 22nd international conference on World Wide Web
Steeler nation, 12th man, and boo birds: classifying Twitter user interests using time series
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Probabilistic semantic similarity measurements for noisy short texts using Wikipedia entities
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Short text classification by detecting information path
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Community question topic categorization via hierarchical kernelized classification
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
An unsupervised transfer learning approach to discover topics for online reputation management
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A feature-word-topic model for image annotation and retrieval
ACM Transactions on the Web (TWEB)
Improving short text classification using public search engines
IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
What users care about: a framework for social content alignment
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Cross lingual entity linking with bilingual topic model
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Towards social data platform: automatic topic-focused monitor for twitter stream
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections. The main motivation of this work is that many classification tasks working with short segments of text & Web, such as search snippets, forum & chat messages, blog & news feeds, product reviews, and book & movie summaries, fail to achieve high accuracy due to the data sparseness. We, therefore, come up with an idea of gaining external knowledge to make the data more related as well as expand the coverage of classifiers to handle future data better. The underlying idea of the framework is that for each classification task, we collect a large-scale external data collection called "universal dataset", and then build a classifier on both a (small) set of labeled training data and a rich set of hidden topics discovered from that data collection. The framework is general enough to be applied to different data domains and genres ranging from Web search results to medical text. We did a careful evaluation on several hundred megabytes of Wikipedia (30M words) and MEDLINE (18M words) with two tasks: "Web search domain disambiguation" and "disease categorization for medical text", and achieved significant quality enhancement.