Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Finding Near-Replicas of Documents and Servers on the Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Comparing and aggregating rankings with ties
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Hi-index | 0.00 |
We introduce a model of uncertainty where documents are not uniquely identified in a reference network, and some links may be incorrect. It generalizes the probabilistic approach on databases to graphs, and defines subgraphs with a probability distribution. The answer to a relational query is a distribution of documents, and we study how to approximate the ranking of the most likely documents and quantify the quality of the approximation. The answer to a function query is a distribution of values and we consider the size of the interval of Minimum and Maximum values as a measure for the precision of the answer.