The infocious web search engine: improving web searching through linguistic analysis

Authors:
Alexandros Ntoulas;Gerald Chao;Junghoo Cho
Affiliations:
Infocious Inc.;Infocious Inc.;University of California at Los Angeles
Venue:
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Year:
2005

Citing 18
Cited 4

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Synchronizing a database to improve freshness

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Modern Information Retrieval

Modern Information Retrieval
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Crawling the web: discovery and maintenance of large-scale web data

Crawling the web: discovery and maintenance of large-scale web data
A probabilistic, integrative approach for improved natural language disambiguation

A probabilistic, integrative approach for improved natural language disambiguation
Liveclassifier: creating hierarchical text classifiers through web corpora

Proceedings of the 13th international conference on World Wide Web
Learning to find answers to questions on the Web

ACM Transactions on Internet Technology (TOIT)
A second-order Hidden Markov Model for part-of-speech tagging

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Maximum entropy models for word sense disambiguation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Effective change detection using sampling

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

A new algorithm for clustering search results

Data & Knowledge Engineering
Improve web search using image snippets

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Exploiting image contents in web search

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web page classification on child suitability

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities present in natural language text. This is achieved by performing linguistic analysis on the content of the Web pages we index, which is a departure from existing Web search engines that return results mainly based on keyword matching. This additional step of linguistic processing gives Infocious two main advantages. First, Infocious gains a deeper understanding of the content of Web pages so it can better match users' queries with indexed documents and therefore can improve relevancy of the returned results. Second, based on its linguistic processing, Infocious can organize and present the results to the user in more intuitive ways. In this paper we present the linguistic processing technologies that we incorporated in Infocious and how they are applied in helping users find information on the Web more efficiently. We discuss the various components in the architecture of Infocious and how each of them benefits from the added linguistic processing. Finally, we experimentally evaluate the performance of a component which leverages linguistic information in order to categorize Web pages.