Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Automatic resource compilation by analyzing hyperlink structure and associated text
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
A small approximately min-wise independent family of hash functions
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
ACM Computing Surveys (CSUR)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
WebBase: a repository of Web pages
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Finding Interesting Associations without Support Pruning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Measures of distributional similarity
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Efficient algorithms for shared camera control
Proceedings of the nineteenth annual symposium on Computational geometry
THESUS: Organizing Web document collections based on link semantics
The VLDB Journal — The International Journal on Very Large Data Bases
Answering imprecise database queries: a novel approach
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
SEWeP: using site semantics and a taxonomy to enhance the Web personalization process
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Using titles and category names from editor-driven taxonomies for automatic evaluation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
WebUml: reverse engineering of web applications
Proceedings of the 2004 ACM symposium on Applied computing
Web Searching and Information Retrieval
Computing in Science and Engineering
Providing ranked relevant results for web database queries
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Mining approximate functional dependencies and concept similarities to answer imprecise queries
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Correlating summarization of multi-source news with k-way graph bi-clustering
ACM SIGKDD Explorations Newsletter
Comparing and aggregating rankings with ties
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Algorithmic detection of semantic similarity
WWW '05 Proceedings of the 14th international conference on World Wide Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
Scaling link-based similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Mapping the Semantics of Web Text and Links
IEEE Internet Computing
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Stanford WebBase components and applications
ACM Transactions on Internet Technology (TOIT)
Mining context specific similarity relationships using the world wide web
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Answering relationship queries on the web
Proceedings of the 16th international conference on World Wide Web
Practical Algorithms and Lower Bounds for Similarity Search in Massive Graphs
IEEE Transactions on Knowledge and Data Engineering
User-assisted similarity estimation for searching related web pages
Proceedings of the eighteenth conference on Hypertext and hypermedia
A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations
Computational Linguistics
Repeatable evaluation of search services in dynamic environments
ACM Transactions on Information Systems (TOIS)
People search: Searching people sharing similar interests from the Web
Journal of the American Society for Information Science and Technology
A personalized search engine based on Web-snippet hierarchical clustering
Software—Practice & Experience
Near-replicas of web pages detection efficient algorithm based on single MD5 fingerprint
ICAI'07 Proceedings of the 8th Conference on 8th WSEAS International Conference on Automation and Information - Volume 8
Towards a unified approach to document similarity search using manifold-ranking of blocks
Information Processing and Management: an International Journal
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A new rank correlation coefficient for information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
Integrated Computer-Aided Engineering
Learning user interests for a session-based personalized search
Proceedings of the second international symposium on Information interaction in context
MedSearch: a specialized search engine for medical information retrieval
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Analysis of Components for Generalization using Multidimensional Scaling
Fundamenta Informaticae
Relating web pages to enable information-gathering tasks
Proceedings of the 20th ACM conference on Hypertext and hypermedia
Tag-based object similarity computation using term space dimension reduction
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Frequent Itemset Mining for Clustering Near Duplicate Web Documents
ICCS '09 Proceedings of the 17th International Conference on Conceptual Structures: Conceptual Structures: Leveraging Semantic Technologies
Information Processing and Management: an International Journal
Evidence of quality of textual features on the web 2.0
Proceedings of the 18th ACM conference on Information and knowledge management
Topic-dependent sentiment analysis of financial blogs
Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion
Towards a graph-based user profile modeling for a session-based personalized search
Knowledge and Information Systems
Anchor text extraction for academic search
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Proceedings of the third ACM international conference on Web search and data mining
Detecting visually similar Web pages: Application to phishing detection
ACM Transactions on Internet Technology (TOIT)
Extracting shared topics of multiple documents
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Document clustering of scientific texts using citation contexts
Information Retrieval
Finding similar RSS news articles using correlation-based phrase matching
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Link proximity analysis: clustering websites by examining link proximity
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Exponential time improvement for min-wise based algorithms
Information and Computation
Web-based visualization interface testing: similarity judgments
Journal of Web Engineering
"Tell me more": finding related items from user provided feedback
DS'11 Proceedings of the 14th international conference on Discovery science
Block-based similarity search on the web using manifold-ranking
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Cluster generation and cluster labelling for web snippets: a fast and accurate hierarchical solution
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Intelligent document filter for the internet
Data Mining
Exponential time improvement for min-wise based algorithms
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Multi agent ontology mapping framework in the AQUA question answering system
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
A scalable randomized method to compute link-based similarity rank on the web graph
EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Factors affecting web page similarity
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Analysis of Components for Generalization using Multidimensional Scaling
Fundamenta Informaticae
Evolving role definitions through permission invocation patterns
Proceedings of the 18th ACM symposium on Access control models and technologies
The VLDB Journal — The International Journal on Very Large Data Bases
Discrete Applied Mathematics
Hi-index | 0.00 |
Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages queries, but comparative evaluation by user studies is expensive, especially when large strategy spaces must be searched (e.g., when tuning parameters). We present a technique for automatically evaluating strategies using Web hierarchies, such as Open Directory, in place of user feedback. We apply this evaluation methodology to a mix of document representation strategies, including the use of text, anchor-text, and links. We discuss the relative advantages and disadvantages of the various approaches examined. Finally, we describe how to efficiently construct a similarity index out of our chosen strategies, and provide sample results from our index.