An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
An algorithm for suffix stripping
Readings in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Automatic Document Classification
Journal of the ACM (JACM)
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Learning for text categorization and information extraction with ILP
Learning language in logic
Relational learning with statistical predicate invention: better models for hypertext
Machine Learning - Special issue on inducive logic programming
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
Learning Logical Definitions from Relations
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Centroid-Based Document Classification: Analysis and Experimental Results
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Learning probabilistic models of link structure
The Journal of Machine Learning Research
Probabilistic classification and clustering in relational data
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Modelling citation networks for improving scientific paper classification performance
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
A citation-based approach to automatic topical indexing of scientific literature
Journal of Information Science
Collective classification of congressional floor-debate transcripts
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Journal of Information Science
Phoneme Based Representation for Vietnamese Web Page Classification
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Linear methods for reduction from ranking to multilabel classification
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Journal of the American Society for Information Science and Technology
Journal of the American Society for Information Science and Technology
Rediscovering ACL discoveries through the lens of ACL anthology network citing sentences
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Hi-index | 0.00 |
This paper introduces a classification system that exploits the content information as well as citation structure for scientific paper classification. The system first applies a content-based statistical classification method which is similar to general text classification. We investigate several classification methods including K-nearest neighbours, nearest centroid, naive Bayes and decision trees. Among those methods, the K-nearest neighbours is found to outperform others while the rest perform comparably. Using phrases in addition to words and a good feature selection strategy such as information gain can improve system accuracy and reduce training time in comparison with using words only. To combine citation links for classification, the system proposes an iterative method to update the labellings of classified instances using citation links. Our results show that, combining contents and citations significantly improves the system performance.