RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
A Probabilistic Approach for Adapting Information Extraction Wrappers and Discovering New Attributes
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Extracting product features and opinions from reviews
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Information extraction from Wikipedia: moving down the long tail
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Data & Knowledge Engineering
What you seek is what you get: extraction of class attributes from query logs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Acquisition of instance attributes via labeled and related instances
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Retrieving attributes using web tables
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Automatic discovery of attribute words from web documents
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Towards a framework for attribute retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Aggregated search: A new information retrieval paradigm
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. Given an instance (e.g. Tower of Pisa), we want to retrieve from the Web its attributes (e.g. height, architect). Our approach uses HTML tables which are probably the largest source for attribute retrieval. Three recall oriented filters are applied over tables to check the following three properties: (i) is the table relational, (ii) has the table a header, and (iii) the conformity of its attributes and values. Candidate attributes are extracted from tables and ranked with a combination of relevance features. Our approach can be applied to all instances and is shown to have a high recall and a reasonable precision. Moreover, it outperforms state of the art techniques.