Some advances in transformation-based part of speech tagging
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Synchronizing a database to improve freshness
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Modern Information Retrieval
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Crawling the web: discovery and maintenance of large-scale web data
Crawling the web: discovery and maintenance of large-scale web data
A probabilistic, integrative approach for improved natural language disambiguation
A probabilistic, integrative approach for improved natural language disambiguation
Liveclassifier: creating hierarchical text classifiers through web corpora
Proceedings of the 13th international conference on World Wide Web
Learning to find answers to questions on the Web
ACM Transactions on Internet Technology (TOIT)
A second-order Hidden Markov Model for part-of-speech tagging
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Maximum entropy models for word sense disambiguation
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Introduction to the CoNLL-2000 shared task: chunking
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Effective change detection using sampling
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A new algorithm for clustering search results
Data & Knowledge Engineering
Improve web search using image snippets
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Exploiting image contents in web search
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web page classification on child suitability
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities present in natural language text. This is achieved by performing linguistic analysis on the content of the Web pages we index, which is a departure from existing Web search engines that return results mainly based on keyword matching. This additional step of linguistic processing gives Infocious two main advantages. First, Infocious gains a deeper understanding of the content of Web pages so it can better match users' queries with indexed documents and therefore can improve relevancy of the returned results. Second, based on its linguistic processing, Infocious can organize and present the results to the user in more intuitive ways. In this paper we present the linguistic processing technologies that we incorporated in Infocious and how they are applied in helping users find information on the Web more efficiently. We discuss the various components in the architecture of Infocious and how each of them benefits from the added linguistic processing. Finally, we experimentally evaluate the performance of a component which leverages linguistic information in order to categorize Web pages.