Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning
ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Classifying search queries using the Web as a source of knowledge
ACM Transactions on the Web (TWEB)
Data mining of maps and their automatic region-time-theme classification
SIGSPATIAL Special
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
Improving text categorization bootstrapping via unsupervised learning
ACM Transactions on Speech and Language Processing (TSLP)
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Automatic content-based categorization of Wikipedia articles
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Measuring intrinsic quality of semantic search based on feature vectors
International Journal of Metadata, Semantics and Ontologies
Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
International Journal of Web and Grid Services
Web page classification on child suitability
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mobile web search personalization using ontological user profile
Proceedings of the 48th Annual Southeast Regional Conference
A combined topical/non-topical approach to identifying web sites for children
Proceedings of the fourth ACM international conference on Web search and data mining
A probabilistic approach to semantic collaborative filtering using world knowledge
Journal of Information Science
A semantic term weighting scheme for text categorization
Expert Systems with Applications: An International Journal
Local and global algorithms for disambiguation to Wikipedia
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Hi-index | 0.00 |
Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowledge, including semi-supervised learning and transfer learning. In this work, we present a new framework for automatic acquisition of world knowledge and methods for incorporating it into the text categorization process. Our approach enhances machine learning algorithms with features generated from domain-specific and common-sense knowledge. This knowledge is represented by ontologies that contain hundreds of thousands of concepts, further enriched through controlled Web crawling. Prior to text categorization, a feature generator analyzes the documents and maps them onto appropriate ontology concepts that augment the bag of words used in simple supervised learning. Feature generation is accomplished through contextual analysis of document text, thus implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses two significant problems in natural language processing---synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the training documents alone. We applied our methodology using the Open Directory Project, the largest existing Web directory built by over 70,000 human editors. Experimental results over a range of data sets confirm improved performance compared to the bag of words document representation.