Searching with numbers

Authors:
Rakesh Agrawal;Ramakrishnan Srikant
Affiliations:
IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA
Venue:
Proceedings of the 11th international conference on World Wide Web
Year:
2002

Citing 14
Cited 10

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Classifying Schematic and Data Heterogeneity in Multidatabase Systems

Computer
Clique partitions, graph compression and speeding-up algorithms

Journal of Computer and System Sciences
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Semantic and schematic similarities between database objects: a context-based approach

The VLDB Journal — The International Journal on Very Large Data Bases
Information Retrieval on the Web

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Data mining for hypertext: a tutorial survey

ACM SIGKDD Explorations Newsletter
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Data Mining Technologies for Digital Libraries and Web Information Systems

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Information integration: A research agenda

IBM Systems Journal
A system for automated mapping of bill-of-materials part numbers

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A search engine for natural language applications

WWW '05 Proceedings of the 14th international conference on World Wide Web
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
The case for a wide-table approach to manage sparse relational data sets

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Database technologies for electronic commerce

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
FASE: A Framework for Scalable Performance Prediction of HPC Systems and Applications

Simulation
Effectiveness of methods for syntactic and semantic recognition of numeral strings: tradeoffs between number of features and length of word N-grams

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Using the normalization for typographic errors in numerals

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A large fraction of the useful web comprises of specification documents that largely consist of hattribute name, numeric valuei pairs embedded in text. Examples include product information, classified advertisements, resumes, etc. The approach taken in the past to search these documents by first establishing correspondences between values and their names has achieved limited success because of the difficulty of extracting this information from free text. We propose a new approach that does not require this correspondence to be accurately established. Provided the data has "low reflectivity", we can do effective search even if the values in the data have not been assigned attribute names and the user has omitted attribute names in the query. We give algorithms and indexing structures for implementing the search. We also show how hints (i. e, imprecise, partial correspondences) from automatic data extraction techniques can be incorporated into our approach for better accuracy on high reflectivity datasets. Finally, we validate our approach by showing that we get high precision in our answers on real datasets from a variety of domains.