On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
An algorithm for string matching with a sequence of don't cares
Information Processing Letters
On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Integrating keyword search into XML query processing
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
XIRQL: a query language for information retrieval in XML documents
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
An expressive and efficient language for XML information retrieval
Journal of the American Society for Information Science and Technology - XML
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
The XXL search engine: ranked retrieval of XML data using indexes and ontologies
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
XRANK: ranked keyword search over XML documents
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Reachability and Distance Queries via 2-Hop Labels
SIAM Journal on Computing
GATE: a general architecture for text engineering
ANLC '97 Proceedings of the fifth conference on Applied natural language processing: Descriptions of system demonstrations and videos
On the Integration of Structure Indexes and Inverted Lists
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
MindNet: acquiring and structuring semantic information from text
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Texquery: a full-text search extension to xquery
Proceedings of the 13th international conference on World Wide Web
Combining the language model and inference network approaches to retrieval
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Improving Web search efficiency via a locality based static pruning method
WWW '05 Proceedings of the 14th international conference on World Wide Web
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Gimme' the context: context-driven automatic semantic annotation with C-PANKOW
WWW '05 Proceedings of the 14th international conference on World Wide Web
A search engine for natural language applications
WWW '05 Proceedings of the 14th international conference on World Wide Web
Enhanced answer type inference from questions using sequential models
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
COMPASS: a concept-based web search engine for HTML, XML, and deep web data
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Shine: search heterogeneous interrelated entities
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Ranking very many typed entities on wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
EntityRank: searching entities directly and holistically
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Optimization issues in inverted index-based entity annotation
Proceedings of the 3rd international conference on Scalable information systems
Foundations and Trends in Databases
Exploiting web search engines to search structured databases
Proceedings of the 18th international conference on World wide web
Tablerank: a ranking algorithm for table search and retrieval
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Effective, design-independent XML keyword search
Proceedings of the 18th ACM conference on Information and knowledge management
Data-oriented content query system: searching for data into text on the web
Proceedings of the third ACM international conference on Web search and data mining
Beyond pages: supporting efficient, scalable entity search with dual-inversion index
Proceedings of the 13th International Conference on Extending Database Technology
DoCQS: a prototype system for supporting data-oriented content query
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Finding support sentences for entities
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Index structures for efficiently searching natural language text
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
EntityEngine: answering entity-relationship queries using shallow semantics
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Entity-relationship queries over wikipedia
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Keyword++: a framework to improve keyword search over entity databases
Proceedings of the VLDB Endowment
Annotating and searching web tables using entities, types and relationships
Proceedings of the VLDB Endowment
Using structural information in XML keyword search effectively
ACM Transactions on Database Systems (TODS)
Web-scale entity-relation search architecture
Proceedings of the 20th international conference companion on World wide web
Index design and query processing for graph conductance search
The VLDB Journal — The International Journal on Very Large Data Bases
Search Computing
Compressed data structures for annotated web search
Proceedings of the 21st international conference on World Wide Web
Optimizing index for taxonomy keyword search
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Entity-Relationship Queries over Wikipedia
ACM Transactions on Intelligent Systems and Technology (TIST)
Hi-index | 0.00 |
We introduce a new, powerful class of text proximity queries: find an instance of a given "answer type" (person, place, distance) near "selector" tokens matching given literals or satisfying given ground predicates. An example query is type=distance NEAR Hamburg Munich. Nearness is defined as a flexible, trainable parameterized aggregation function of the selectors, their frequency in the corpus, and their distance from the candidate answer. Such queries provide a key data reduction step for information extraction, data integration, question answering, and other text-processing applications. We describe the architecture of a next-generation information retrieval engine for such applications, and investigate two key technical problems faced in building it. First, we propose a new algorithm that estimates a scoring function from past logs of queries and answer spans. Plugging the scoring function into the query processor gives high accuracy: typically, an answer is found at rank 2-4. Second, we exploit the skew in the distribution over types seen in query logs to optimize the space required by the new index structures required by our system. Extensive performance studies with a 10GB, 2-million document TREC corpus and several hundred TREC queries show both the accuracy and the efficiency of our system. From an initial 4.3GB index using 18,000 types from WordNet, we can discard 88% of the space, while inflating query times by a factor of only 1.9. Our final index overhead is only 20% of the total index space needed.