Implementing agglomerative hierarchic clustering algorithms for use in document retrieval
Information Processing and Management: an International Journal
The vocabulary problem in human-system communication
Communications of the ACM
Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Constant interaction-time scatter/gather browsing of very large document collections
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Information seeking in electronic environments
Information seeking in electronic environments
The World-Wide Web: quagmire or gold mine?
Communications of the ACM
Query expansion using local and global document analysis
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Projections for efficient document clustering
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
SONIA: a service for organizing networked information autonomously
Proceedings of the third ACM conference on Digital libraries
Advantages of query biased summaries in information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Comparing noun phrasing techniques for use with medical digital library tools
Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM SIGKDD Explorations Newsletter
Applying summarization techniques for term selection in relevance feedback
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
MetaSpider: meta-searching and categorization on the Web
Journal of the American Society for Information Science and Technology
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Improving retrieval feedback with multiple term-ranking function combination
ACM Transactions on Information Systems (TOIS)
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Enabling Concept-Based Relevance Feedback for Information Retrieval on the WWW
IEEE Transactions on Knowledge and Data Engineering
TétraFusion: Information Discovery on the Internet
IEEE Intelligent Systems
Automatic information extraction from semi-structured Web pages by pattern discovery
Decision Support Systems - Web retrieval and mining
A task-oriented study on the influencing effects of query-biased summarisation in web searching
Information Processing and Management: an International Journal
A Nonlinear Mapping for Data Structure Analysis
IEEE Transactions on Computers
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a ''Tree-like'' Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.