A novel semantic information retrieval system based on a three-level domain model

Authors:
Licia Sbattella;Roberto Tedesco
Affiliations:
Politecnico di Milano, Dipartimento di Elettronica e Informazione, Via Ponzio 34/5, 20133 Milan, Italy;Politecnico di Milano, MultiChancePoliTeam, P.zza Leonardo da Vinci 32, 20133 Milan, Italy
Venue:
Journal of Systems and Software
Year:
2013

Citing 21
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
A vector space model for automatic indexing

Communications of the ACM
Automatic Ontology-Based Knowledge Extraction from Web Documents

IEEE Intelligent Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The kappa statistic: a second look

Computational Linguistics
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A review of ontology based query expansion

Information Processing and Management: an International Journal
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
An ontology-driven approach for semantic information retrieval on the Web

ACM Transactions on Internet Technology (TOIT)
Using domain knowledge for ontology-guided entity extraction from noisy, unstructured text data

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Named entity recognition in query

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Fast sequential decoding algorithm using a stack

IBM Journal of Research and Development
A framework for understanding Latent Semantic Indexing (LSI) performance

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Ontology-based information extraction: An introduction and a survey of current approaches

Journal of Information Science
Another look at the data sparsity problem

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a methodology and a prototype for extracting and indexing knowledge from natural language documents. The underlying domain model relies on a conceptual level (described by means of a domain ontology), which represents the domain knowledge, and a lexical level (based on WordNet), which represents the domain vocabulary. A stochastic model (the ME-2L-HMM2, which mixes - in a novel way - HMM and maximum entropy models) stores the mapping between such levels, taking into account the linguistic context of words. Not only does such a context contain the surrounding words; it also contains morphologic and syntactic information extracted using natural language processing tools. The stochastic model is then used, during the document indexing phase, to disambiguate word meanings. The semantic information retrieval engine we developed supports simple keyword-based queries, as well as natural language-based queries. The engine is also able to extend the domain knowledge, discovering new and relevant concepts to add to the domain model. The validation tests indicate that the system is able to disambiguate and extract concepts with good accuracy. A comparison between our prototype and a classic search engine shows that the proposed approach is effective in providing better accuracy.