Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Open information extraction from the web
Communications of the ACM - Surviving the data deluge
Database and information-retrieval methods for knowledge discovery
Communications of the ACM - A Direct Path to Dependable Software
Information extraction challenges in managing unstructured data
ACM SIGMOD Record
StatSnowball: a statistical approach to extracting entity relationships
Proceedings of the 18th international conference on World wide web
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
Collective annotation of Wikipedia entities in web text
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Gathering and ranking photos of named entities with high precision, high recall, and diversity
Proceedings of the third ACM international conference on Web search and data mining
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Bayesian knowledge corroboration with logical rules and user feedback
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Scalable probabilistic databases with factor graphs and MCMC
Proceedings of the VLDB Endowment
Scalable knowledge harvesting with high precision and high recall
Proceedings of the fourth ACM international conference on Web search and data mining
Chapter 3: search for knowledge
Search Computing
Turning the web into a database: extracting data and structure
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.01 |
DB researchers have traditionally focused on engine-centered issues such as indexing, query processing, and transactions. Data mining has broadened the community's viewpoint towards algorithmic and statistical issues. However, DB research has always had a tendency to shy away from seemingly elusive long-term challenges with AI flavor. On the other hand, the current explosion of digital content in enterprises and the Internet, is mostly caused by user-created information like text, tags, photos, videos, and not by seeing more well-designed databases of the traditional kind. In this situation, I question the traditional skepticism of DB researchers towards "AI-complete" problems and the DB community's reluctance to embark on seemingly non-DB-ish grand challenges. Big questions that I see as great opportunities also for DB research include: 1) automatic extraction of relational facts from natural-language text and multimodal contexts [4, 6, 21], 2) automatic disambiguation of named-entity mentions and general phrases in text and speech [10, 11], 3) large-scale gathering of factual-knowledge candidates and their reconciliation into comprehensive knowledge bases [1, 2, 8, 13, 19], 4) reasoning on uncertain hypotheses, for knowledge discovery and semantic search [9, 14, 16, 17, 20], 5) deep and real-time question answering, e.g., to enable computers to win quiz game shows [7], 6) machine-reading of scientific publications and fictional literature, to enable corpus-wide analyses and enable researchers in science and humanities to develop hypotheses and quickly focus on the most relevant issues [3, 5]. I believe that successfully tackling these topics requires efficient data-centric algorithms, scalable methods and architectures, and system-level thinking - virtues that are richly available in the DB research community. Moreover, I would encourage our community to look across the fence and get more engaged on the exciting challenges outside the traditionally narrow boundaries of the DB realm. I will illustrate these points by examples from my own research on knowledge management [12, 15, 18, 19]. Breakthroughs will require long-term stamina. In the meantime, steady incremental progress is better than not embarking on these important problems at all.