The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
DEADLINER: building a new niche search engine
Proceedings of the ninth international conference on Information and knowledge management
A vector space model for automatic indexing
Communications of the ACM
SEAL: a framework for developing SEmantic PortALs
Proceedings of the 1st international conference on Knowledge capture
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
OntoWeb - A Semantic Web Community Portal
PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
A Machine Learning Approach to Building Domain-Specific Search Engines
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Declarative specification of Web sites with S
The VLDB Journal — The International Journal on Very Large Data Bases
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
AI Magazine
How to build a WebFountain: An architecture for very large-scale text analytics
IBM Systems Journal
Finding Related Pages Using the Link Structure of the WWW
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Natural Language Engineering
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Efficient Batch Top-k Search for Dictionary-based Entity Recognition
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
POLYPHONET: an advanced social network extraction system from the web
Proceedings of the 15th international conference on World Wide Web
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Seeking stable clusters in the blogosphere
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Community systems research at Yahoo!
ACM SIGMOD Record
OLAP over imprecise data with domain constraints
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A relational approach to incrementally extracting and querying structure in unstructured data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
YAGO: A Large Ontology from Wikipedia and WordNet
Web Semantics: Science, Services and Agents on the World Wide Web
Multidimensional content eXploration
Proceedings of the VLDB Endowment
On the provenance of non-answers to queries over extracted data
Proceedings of the VLDB Endowment
Harvesting, searching, and ranking knowledge on the web: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Database and information-retrieval methods for knowledge discovery
Communications of the ACM - A Direct Path to Dependable Software
Information extraction challenges in managing unstructured data
ACM SIGMOD Record
Purple SOX extraction management system
ACM SIGMOD Record
The YAGO-NAGA approach to knowledge discovery
ACM SIGMOD Record
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficiently incorporating user feedback into information extraction and integration programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing complex extraction programs over evolving text data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Qualitative effects of knowledge rules and user feedback in probabilistic data integration
The VLDB Journal — The International Journal on Very Large Data Bases
Data integration for the relational web
Proceedings of the VLDB Endowment
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Find your advisor: robust knowledge gathering from the web
Procceedings of the 13th International Workshop on the Web and Databases
Entity-relationship queries over wikipedia
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Extracting local web communities using lexical similarity
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Just-in-time data integration in action
Proceedings of the VLDB Endowment
Collective extraction from heterogeneous web lists
Proceedings of the fourth ACM international conference on Web search and data mining
An analysis of structured data on the web
Proceedings of the VLDB Endowment
Automatic web-scale information extraction
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Entity-Relationship Queries over Wikipedia
ACM Transactions on Intelligent Systems and Technology (TIST)
Incrementally improving dataspaces based on user feedback
Information Systems
Building, maintaining, and using knowledge bases: a report from the trenches
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Beyond search: Retrieving complete tuples from a text-database
Information Systems Frontiers
Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Structured community portals extract and integrate information from raw Web pages to present a unified view of entities and relationships in the community. In this paper we argue that to build such portals, a top-down, compositional, and incremental approach is a good way to proceed. Compared to current approaches that employ complex monolithic techniques, this approach is easier to develop, understand, debug, and optimize. In this approach, we first select a small set of important community sources. Next, we compose plans that extract and integrate data from these sources, using a set of extraction/integration operators. Executing these plans yields an initial structured portal. We then incrementally expand this portal by monitoring the evolution of current data sources, to detect and add new data sources. We describe our initial solutions to the above steps, and a case study of employing these solutions to build DBLife, a portal for the database community. We found that DBLife could be built quickly and achieve high accuracy using simple extraction/integration operators, and that it can be maintained and expanded with little human effort. The initial solutions together with the case study demonstrate the feasibility and potential of our approach.