A vector space model for automatic indexing
Communications of the ACM
Querying and ranking XML documents
Journal of the American Society for Information Science and Technology - XML
Searching structured documents
Information Processing and Management: an International Journal
The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ACM SIGIR Forum
Ontology evaluation using wikipedia categories for browsing
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Proceedings of the 2008 ACM symposium on Applied computing
Effective use of semantic structure in XML retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Hi-index | 0.01 |
Structured retrieval aims at exploiting the structural information of documents when searching for documents. Structured retrieval makes use of both content and structure of documents to improve information retrieval. Therefore, the availability of semantic structure in the documents is an important factor for the success of structured retrieval. However, the majority of documents in the Web still lack semantically-rich structure. This motivates us to automatically identify the semantic information in web documents and explicitly annotate the information with semantic tags. Based on the well-known Wikipedia corpus, this paper describes an unsupervised learning approach to identify conceptual information and descriptive information of an entity described in a Wikipedia article. Our approach utilizes Wikipedia link structure and Infobox information in order to learn the semantic structure of the Wikipedia articles. We also describe a lazy approach used in the learning process. By utilizing the Wikipedia categories provided by the contributors, only a subset of entities in a Wikipedia category is used as training data in the learning process and the results can be applied to the rest of the entities in the category.