Foundations of statistical natural language processing
Foundations of statistical natural language processing
A vector space model for automatic indexing
Communications of the ACM
Automatic Ontology-Based Knowledge Extraction from Web Documents
IEEE Intelligent Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The kappa statistic: a second look
Computational Linguistics
Sequential conditional Generalized Iterative Scaling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A review of ontology based query expansion
Information Processing and Management: an International Journal
Word sense disambiguation: A survey
ACM Computing Surveys (CSUR)
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
An ontology-driven approach for semantic information retrieval on the Web
ACM Transactions on Internet Technology (TOIT)
Using domain knowledge for ontology-guided entity extraction from noisy, unstructured text data
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Named entity recognition in query
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Fast sequential decoding algorithm using a stack
IBM Journal of Research and Development
A framework for understanding Latent Semantic Indexing (LSI) performance
Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Ontology-based information extraction: An introduction and a survey of current approaches
Journal of Information Science
Another look at the data sparsity problem
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Hi-index | 0.00 |
This paper presents a methodology and a prototype for extracting and indexing knowledge from natural language documents. The underlying domain model relies on a conceptual level (described by means of a domain ontology), which represents the domain knowledge, and a lexical level (based on WordNet), which represents the domain vocabulary. A stochastic model (the ME-2L-HMM2, which mixes - in a novel way - HMM and maximum entropy models) stores the mapping between such levels, taking into account the linguistic context of words. Not only does such a context contain the surrounding words; it also contains morphologic and syntactic information extracted using natural language processing tools. The stochastic model is then used, during the document indexing phase, to disambiguate word meanings. The semantic information retrieval engine we developed supports simple keyword-based queries, as well as natural language-based queries. The engine is also able to extend the domain knowledge, discovering new and relevant concepts to add to the domain model. The validation tests indicate that the system is able to disambiguate and extract concepts with good accuracy. A comparison between our prototype and a classic search engine shows that the proposed approach is effective in providing better accuracy.