Elements of information theory
Elements of information theory
An extended vector-processing scheme for searching information in hypertext systems
Information Processing and Management: an International Journal
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Statistical Models for Text Segmentation
Machine Learning - Special issue on natural language learning
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the 10th international conference on World Wide Web
Finding authorities and hubs from link structures on the World Wide Web
Proceedings of the 10th international conference on World Wide Web
A case study in web search using TREC algorithms
Proceedings of the 10th international conference on World Wide Web
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Mining the Web's Link Structure
Computer
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Topic segmentation: algorithms and applications
Topic segmentation: algorithms and applications
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Entropy-based link analysis for mining web informative structures
Proceedings of the eleventh international conference on Information and knowledge management
A Method of Improving Feature Vector for Web Pages Reflecting the Contents of Their Out-Linked Pages
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Challenges in web search engines
ACM SIGIR Forum
Topic distillation using hierarchy concept tree
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
XRANK: ranked keyword search over XML documents
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Searching the hypermedia web: improved topic distillation through network analytic relevance ranking
The New Review of Hypermedia and Multimedia - Hypermedia and the world wide web
Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
An indexing model of HTML documents
Proceedings of the 2003 ACM symposium on Applied computing
A Unified Probabilistic Framework for Web Page Scoring Systems
IEEE Transactions on Knowledge and Data Engineering
Mining Web Informative Structures and Contents Based on Entropy Analysis
IEEE Transactions on Knowledge and Data Engineering
Link mining: a new data mining challenge
ACM SIGKDD Explorations Newsletter
Automatic topics discovery from hyperlinked documents
Information Processing and Management: an International Journal
Human versus machine in the topic distillation task
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning effective ranking functions for newsgroup search
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web Searching and Information Retrieval
Computing in Science and Engineering
A Report of Activities at the WIC-India Research Center
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Exploiting the hierarchical structure for link analysis
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A study of relevance propagation for web search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
Focused crawling: experiences in a real world project
Proceedings of the 15th international conference on World Wide Web
A fast and robust method for web page template detection and removal
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Topic distillation via sub-site retrieval
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Two-phase Web site classification based on Hidden Markov Tree models
Web Intelligence and Agent Systems
Proceedings of the 16th international conference on World Wide Web
Automatic summarising: The state of the art
Information Processing and Management: an International Journal
Computing block importance for searching on web sites
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A graph-theoretic approach to webpage segmentation
Proceedings of the 17th international conference on World Wide Web
Automatic Recognition of News Web Pages
PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Query suggestion using hitting time
Proceedings of the 17th ACM conference on Information and knowledge management
SEA: Segment-enrich-annotate paradigm for adapting dialog-based content for improved accessibility
ACM Transactions on Information Systems (TOIS)
On Finding Templates on Web Collections
World Wide Web
Refining search results using a mining framework
Expert Systems with Applications: An International Journal
Web page cleaning for web mining through feature weighting
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Challenges in web search engines
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Users, Queries and Documents: A Unified Representation for Web Mining
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Perturbation of the hyper-linked environment
COCOON'03 Proceedings of the 9th annual international conference on Computing and combinatorics
Fuzzy web surfer models: theory and experiments
WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
Leveraging structural knowledge for hierarchically-informed keyword weight propagation in the web
WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Web page DOM node characterization and its application to page segmentation
IMSAA'09 Proceedings of the 3rd IEEE international conference on Internet multimedia services architecture and applications
Enhancing browsing experience of table and image elements in web pages
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Expert Systems with Applications: An International Journal
Relevance propagation model for large hypertext document collections
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
A unified representation of web logs for mining applications
Information Retrieval
Subsite retrieval: a novel concept for topic distillation
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
Calculating webpage importance with site structure constraints
AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
VisHue: web page segmentation for an improved query interface for medlineplus medical encyclopedia
DNIS'11 Proceedings of the 7th international conference on Databases in Networked Information Systems
Hi-index | 0.00 |
Topic distillation is the analysis of hyperlink graph structure to identify mutually reinforcing authorities (popular pages) and hubs (comprehensive lists of links to authorities). Topic distillation is becoming common in Web search engines, but the best-known algorithms model the Web graph at a coarse grain, with whole pages as single nodes. Such models may lose vital details in the markup tag structure of the pages, and thus lead to a tightly linked irrelevant subgraph winning over a relatively sparse relevant subgraph, a phenomenon called topic drift or contamination. The problem gets especially severe in the face of increasingly complex pages with navigation panels and advertisement links. We present an enhanced topic distillation algorithm which analyzes text, the markup tag trees that constitute HTML pages, and hyperlinks between pages. It thereby identifies subtrees which have high text- and hyperlink-based coherence w.r.t. the query. These subtrees get preferential treatment in the mutual reinforcement process. Using over 50 queries, 28 from earlier topic distillation work, we analyzed over 700,000 pages and obtained quantitative and anecdotal evidence that the new algorithm reduces topic drift.