Information categorization in web pages and sites

Authors:
Vincenza Carchiolo;Alessandro Longheu;Michele Malgeri
Affiliations:
Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Facoltà/ di Ingegneria, V.le A. Doria 6, I95125, Catania, Italy. Tel.: +39 095 738 2359/ Fax: +39 095 738 2397/ E-mail: car,al ...;Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Facoltà/ di Ingegneria, V.le A. Doria 6, I95125, Catania, Italy. Tel.: +39 095 738 2359/ Fax: +39 095 738 2397/ E-mail: car,al ...;Dipartimento di Ingegneria Informatica e delle Telecomunicazioni, Facoltà/ di Ingegneria, V.le A. Doria 6, I95125, Catania, Italy. Tel.: +39 095 738 2359/ Fax: +39 095 738 2397/ E-mail: car,al ...
Venue:
Web Intelligence and Agent Systems
Year:
2005

Citing 28
Cited 0

The use of cluster hierarchies in hypertext information retrieval

HYPERTEXT '89 Proceedings of the second annual ACM conference on Hypertext
Silk from a sow's ear: extracting usable structures from the Web

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
User-oriented document clustering: a framework for learning in information retrieval

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
The TSIMMIS Approach to Mediation: Data Models and Languages

Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Semistructured data

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Cut as a querying unit for WWW, Netnews, and E-mail

Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Design principles for data-intensive Web sites

ACM SIGMOD Record
Data on the Web: from relations to semistructured data and XML

Data on the Web: from relations to semistructured data and XML
Defining logical domains in a web site

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Automatic information extraction from web pages

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Discovering authorities and hubs in different topological Web graph structures

Information Processing and Management: an International Journal
Automating extraction of logical domains in a web site

Data & Knowledge Engineering
Improving Web Site Design

IEEE Internet Computing
Improving Web Usability Through Visualization

IEEE Internet Computing
To Weave the Web

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Jedi: Extracting and Synthesizing Information from the Web

COOPIS '98 Proceedings of the 3rd IFCIS International Conference on Cooperative Information Systems
Text categorization based on k-nearest neighbor approach for web site classification

Information Processing and Management: an International Journal
The use of web structure and content to identify subjectively interesting web usage patterns

ACM Transactions on Internet Technology (TOIT)
Structuring the Web

DEXA '00 Proceedings of the 11th International Workshop on Database and Expert Systems Applications
Automatic Web Page Classification in a Dynamic and Hierarchical Way

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, surfing on the net is not limited to the search of scientific information, indeed a generic user is interested in different kinds of information about business, music, travel and so on. When accessing web documents, however, the lack of explicit structure does not facilitate in understanding data semantics, thus the comprehension of logical organization of web data relies on user's intuition of the underlying author's schema. In this paper, we present an approach to web structuring based on the analysis of the structure and the semantics of both web pages and sites, in order to discover and provide users with hidden schemas. Aimed benefits from this work are to facilitate the navigation inside web documents/sites, to promote the use of more powerful, semantic-based search methods and to allow better pages/sites management and re-design.