Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised conditional random fields for improved sequence segmentation and labeling
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
KnowItNow: fast, scalable information extraction from the web
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Proceedings of the 16th international conference on World Wide Web
Autonomously semantifying wikipedia
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Weakly-supervised discovery of named entities using web search queries
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Freebase: a collaboratively created graph database for structuring human knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
YAGO: A Large Ontology from Wikipedia and WordNet
Web Semantics: Science, Services and Agents on the World Wide Web
Language-Independent Set Expansion of Named Entities Using the Web
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Harvesting relational tables from lists on the web
Proceedings of the VLDB Endowment
Entity extraction via ensemble semantics
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Web-scale distributional similarity and entity set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Character-level analysis of semi-structured documents for set expansion
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Coupled semi-supervised learning for information extraction
Proceedings of the third ACM international conference on Web search and data mining
IEEE Transactions on Knowledge and Data Engineering
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Open information extraction using Wikipedia
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Learning 5000 relational extractors
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Distributional similarity vs. PU learning for entity set expansion
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Efficient graph-based semi-supervised learning of structured tagging models
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Annotating and searching web tables using entities, types and relationships
Proceedings of the VLDB Endowment
Web-scale table census and classification
Proceedings of the fourth ACM international conference on Web search and data mining
Recovering semantics of tables on the web
Proceedings of the VLDB Endowment
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Open information extraction: the second generation
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Towards an enhanced and adaptable ontology by distilling and assembling online encyclopedias
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Structured positional entity language model for enterprise entity retrieval
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Hi-index | 0.00 |
We develop a new framework to achieve the goal of Wikipedia entity expansion and attribute extraction from the Web. Our framework takes a few existing entities that are automatically collected from a particular Wikipedia category as seed input and explores their attribute infoboxes to obtain clues for the discovery of more entities for this category and the attribute content of the newly discovered entities. One characteristic of our framework is to conduct discovery and extraction from desirable semi-structured data record sets which are automatically collected from the Web. A semi-supervised learning model with Conditional Random Fields is developed to deal with the issues of extraction learning and limited number of labeled examples derived from the seed entities. We make use of a proximate record graph to guide the semi-supervised learning process. The graph captures alignment similarity among data records. Then the semi-supervised learning process can leverage the unlabeled data in the record set by controlling the label regularization under the guidance of the proximate record graph. Extensive experiments on different domains have been conducted to demonstrate its superiority for discovering new entities and extracting attribute content.