Navigation via similarity: automatic linking based on semantic closeness
Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
ACM Computing Surveys (CSUR)
The intellectual foundation of information organization
The intellectual foundation of information organization
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Summarization as feature selection for text categorization
Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Subject Analysis on Online Catalogs
Subject Analysis on Online Catalogs
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Discovering Test Set Regularities in Relational Domains
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Hypertext Categorization using Hyperlink Patterns and Meta Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
When are links useful? experiments in text classification
ECIR'03 Proceedings of the 25th European conference on IR research
Hierarchical classification of HTML documents with WebClassII
ECIR'03 Proceedings of the 25th European conference on IR research
Classifiers without borders: incorporating fielded text from neighboring web pages
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Novel web page classification techniques in contextual advertising
Proceedings of the eleventh international workshop on Web information and data management
Web page classification on child suitability
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A combined topical/non-topical approach to identifying web sites for children
Proceedings of the fourth ACM international conference on Web search and data mining
Using main content extraction to improve performance of Vietnamese web page classification
Proceedings of the Second Symposium on Information and Communication Technology
Hi-index | 0.00 |
The aim of the study was to determine how significance indicators assigned to different Web page elements (internal metadata, title, headings, and main text) influence automated classification. The data collection that was used comprised 1000 Web pages in engineering, to which Engineering Information classes had been manually assigned. The significance indicators were derived using several different methods: (total and partial) precision and recall, semantic distance and multiple regression. It was shown that for best results all the elements have to be included in the classification process. The exact way of combining the significance indicators turned out not to be overly important: using the F1 measure, the best combination of significance indicators yielded no more than 3% higher performance results than the baseline.