Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Models for retrieval with probabilistic indexing
Information Processing and Management: an International Journal - Modeling data, information and knowledge
A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On modeling information retrieval with probabilistic inference
ACM Transactions on Information Systems (TOIS)
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Personal ontologies for web navigation
Proceedings of the ninth international conference on Information and knowledge management
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Automatic Topic Identification Using Ontology Hierarchy
CICLing '01 Proceedings of the Second International Conference on Computational Linguistics and Intelligent Text Processing
System of information retrieval in XML documents
Effective databases for text & document management
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Ontology-based personalized search and browsing
Web Intelligence and Agent Systems
Combining structural and citation-based evidence for text classification
Proceedings of the thirteenth ACM international conference on Information and knowledge management
BDEI: Biodiversity Information Organization using Taxonomy (BIOT)
dg.o '02 Proceedings of the 2002 annual national conference on Digital government research
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
A study on optimal parameter tuning for Rocchio text classifier
ECIR'03 Proceedings of the 25th European conference on IR research
Classifying documents with link-based bibliometric measures
Information Retrieval
Ontology-based automatic classification of web documents
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Topic selection of web documents using specific domain ontology
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
An automatic approach to classify web documents using a domain ontology
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Hi-index | 0.00 |
The automatic categorisation of web documents is becoming crucial for organising the huge amount of information available in the Internet. We are facing a new challenge due to the fact that web documents have a rich structure and are highly heterogeneous. Two ways to respond to this challenge are (1) using a representation of the content of web documents that captures these two characteristics and (2) using more effective classifiers.Our categorisation approach is based on a probabilistic description-oriented representation of web documents, and a probabilistic interpretation of the k-nearest neighbour classifier. With the former, we provide an enhanced document representation that incorporates the structural and heterogeneous nature of web documents. With the latter, we provide a theoretical sound justification for the various parameters of the k-nearest neighbour classifier.Experimental results show that (1) using an enhanced representation of web documents is crucial for an effective categorisation of web documents, and (2) a theoretical interpretation of the k-nearest neighbour classifier gives us improvement over the standard k-nearest neighbour classifier.