Towards a framework for attribute retrieval

Authors:
Arlind Kopliku;Mohand Boughanem;Karen Pinel-Sauvagnat
Affiliations:
IRIT, University of Toulouse, Toulouse, France;IRIT, University of Toulouse, Toulouse, France;IRIT, University of Toulouse, Toulouse, France
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 22
Cited 2

Data mining: practical machine learning tools and techniques with Java implementations

ACM SIGMOD Record
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
A Probabilistic Approach for Adapting Information Extraction Wrappers and Discovering New Attributes

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Offline strategies for online question answering: answering questions before they are asked

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A Survey of Web Information Extraction Systems

IEEE Transactions on Knowledge and Data Engineering
Extracting product features and opinions from reviews

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Web object retrieval

Proceedings of the 16th international conference on World Wide Web
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Beyond basic faceted search

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Information extraction from Wikipedia: moving down the long tail

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
WebTables: exploring the power of tables on the web

Proceedings of the VLDB Endowment
An unsupervised method for joint information extraction and feature mining across different Web sites

Data & Knowledge Engineering
What you seek is what you get: extraction of class attributes from query logs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Unsupervised named-entity extraction from the Web: An experimental study

Artificial Intelligence
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Acquisition of instance attributes via labeled and related instances

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Focused and aggregated search: a perspective from natural language generation

Information Retrieval
Retrieving attributes using web tables

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Attribute retrieval from relational web tables

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Automatic discovery of attribute words from web documents

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Using Bayesian networks theory for aggregated search to XML retrieval

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Aggregated search: A new information retrieval paradigm

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. We distinguish between class attribute retrieval and instance attribute retrieval. On one hand, given an instance (e.g. University of Strathclyde) we retrieve from the Web its attributes (e.g. principal, location, number of students). On the other hand, given a class (e.g. universities) represented by a set of instances, we retrieve common attributes of its instances. Furthermore, we show we can reinforce instance attribute retrieval if similar instances are available. Our approach uses HTML tables which are probably the largest source for attribute retrieval. Three recall oriented filters are applied over tables to check the following three properties: (i) is the table relational, (ii) has the table a header, and (iii) the conformity of its attributes and values. Candidate attributes are extracted from tables and ranked with a combination of relevance features. Our approach is shown to have a high recall and a reasonable precision. Moreover, it outperforms state of the art techniques.