The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Multidocument summarization via information extraction
HLT '01 Proceedings of the first international conference on Human language technology research
Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia
GROUP '05 Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work
Sentence Fusion for Multidocument News Summarization
Computational Linguistics
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Entity Resolution with Markov Logic
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Soft pattern matching models for definitional question answering
ACM Transactions on Information Systems (TOIS)
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Autonomously semantifying wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Automatically refining the wikipedia infobox ontology
Proceedings of the 17th international conference on World Wide Web
StatSnowball: a statistical approach to extracting entity relationships
Proceedings of the 18th international conference on World wide web
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
Structural, transitive and latent models for biographic fact extraction
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Joint unsupervised coreference resolution with Markov logic
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Discriminative training of Markov logic networks
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Multi-document summarization by maximizing informative content-words
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Shallow semantics for relation extraction
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Unsupervised named-entity extraction from the Web: An experimental study
Artificial Intelligence
Extraction and geographical navigation of important historical events in the web
W2GIS'11 Proceedings of the 10th international conference on Web and wireless geographical information systems
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Hi-index | 0.00 |
Internet users regularly have the need to find biographies and facts of people of interest. Wikipedia has become the first stop for celebrity biographies and facts. However, Wikipedia can only provide information for celebrities because of its neutral point of view (NPOV) editorial policy. In this paper we propose an integrated bootstrapping framework named BioSnowball to automatically summarize the Web to generate Wikipedia-style pages for any person with a modest web presence. In BioSnowball, biography ranking and fact extraction are performed together in a single integrated training and inference process using Markov Logic Networks (MLNs) as its underlying statistical model. The bootstrapping framework starts with only a small number of seeds and iteratively finds new facts and biographies. As biography paragraphs on the Web are composed of the most important facts, our joint summarization model can improve the accuracy of both fact extraction and biography ranking compared to decoupled methods in the literature. Empirical results on both a small labeled data set and a real Web-scale data set show the effectiveness of BioSnowball. We also empirically show that BioSnowball outperforms the decoupled methods.