Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Classifying news stories using memory based reasoning
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval
Representation and learning in information retrieval
Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based text categorization: a comparison of category search strategies
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal - Special issue: history of information science
Combining classifiers in text categorization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Using a generalized instance set for automatic text categorization
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Hierarchical classification of Web content
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Exploiting Hierarchy in Text Categorization
Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Blocking objectionable web content by leveraging multiple information sources
ACM SIGKDD Explorations Newsletter
Information categorization in web pages and sites
Web Intelligence and Agent Systems
Document Classification Based on Support Vector Machine Using a Concept Vector Model
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A comparative study of two automatic document classification methods in a library setting
Journal of Information Science
Using the shape recovery method to evaluate indexing techniques
Journal of the American Society for Information Science and Technology
Estimation of FAQ knowledge bases by using semantic expressions for questions and answers
International Journal of Computer Applications in Technology
CWC: A Clustering-Based Feature Weighting Approach for Text Classification
MDAI '07 Proceedings of the 4th international conference on Modeling Decisions for Artificial Intelligence
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Adaptive Web SitesA Knowledge Extraction from Web Data Approach
Proceedings of the 2008 conference on Adaptive Web Sites: A Knowledge Extraction from Web Data Approach
A comparison of fraud cues and classification methods for fake escrow website detection
Information Technology and Management
Novel web page classification techniques in contextual advertising
Proceedings of the eleventh international workshop on Web information and data management
Intelligent QA Systems Using Semantic Expressions
KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II
3PRS: a personalized popular program recommendation system for digital TV for P2P social networks
Multimedia Tools and Applications
International Journal of Computers and Applications
Commercial Internet filters: Perils and opportunities
Decision Support Systems
Classification of software artifacts based on structural information
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV
Use of Medical Subject Headings (MeSH) in Portuguese for categorizing web-based healthcare content
Journal of Biomedical Informatics
ACM Transactions on the Web (TWEB)
Estimation of FAQ knowledge bases by introducing measurements
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
A new technique of determining speaker's intention for sentences in conversation
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part IV
CALA: An unsupervised URL-based web page classification system
Knowledge-Based Systems
Hi-index | 0.00 |
Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification. and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.