Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
A vector space model for automatic indexing
Communications of the ACM
Proceedings of the 11th international conference on World Wide Web
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Learning to Probabilistically Identify Authoritative Documents
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Link analysis ranking: algorithms, theory, and experiments
ACM Transactions on Internet Technology (TOIT)
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Enhancing web page classification through image-block importance analysis
Information Processing and Management: an International Journal
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Web page classification: a probabilistic model with relational uncertainty
IPMU'10 Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty
Hi-index | 0.00 |
One of the most important tasks in Information Retrieval (IR) is related to web page information extraction and processing. It is a common approach to consider a web page as an atomic unit and to model its textual content as a "bag-of-words". However, this kind of representation does not reflect how people perceive a web page. A granular document representation, in terms of semantic objects, can help in identifying semantic areas of a web page and using them for different IR goals. In this paper we use a granular representation to define a new metric for evaluating semantic object importance and to enhance the performance of IR systems. In particular we show that this new metric can be used not only for classification goals, in which instances are assumed as independent and identically distributed, but also to gauge the strength of relationship between hypertextual documents and exploit this information for improving page ranking performance.