Search engine indexing storage optimisation using Hamming distance

Authors:
Anirban Kundu;Siddhartha Sett;Subhajit Kumar;Shruti Sengupta;Srayan Chaudhury
Affiliations:
Netaji Subhash Engineering College, West Bengal University of Technology, Calcutta 700152, India/ Innovation Research Lab (IRL), Capex Technologies, West Bengal 711103, India.;Netaji Subhash Engineering College, West Bengal University of Technology, Calcutta 700152, India/ Innovation Research Lab (IRL), Capex Technologies, West Bengal 711103, India.;Netaji Subhash Engineering College, West Bengal University of Technology, Calcutta 700152, India/ Innovation Research Lab (IRL), Capex Technologies, West Bengal 711103, India.;Netaji Subhash Engineering College, West Bengal University of Technology, Calcutta 700152, India/ Innovation Research Lab (IRL), Capex Technologies, West Bengal 711103, India.;Netaji Subhash Engineering College, West Bengal University of Technology, Calcutta 700152, India/ Innovation Research Lab (IRL), Capex Technologies, West Bengal 711103, India
Venue:
International Journal of Intelligent Information and Database Systems
Year:
2012

Citing 10
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Accessibility of information on the Web

intelligence
Information retrieval on the web

ACM Computing Surveys (CSUR)
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Web Search Engines: Part 1

Computer
Web Search Engines: Part 2

Computer
Introduction to Information Retrieval

Introduction to Information Retrieval
Design of SMACA: synthesis and its analysis through rule vector graph for web based application

International Journal of Intelligent Information and Database Systems
Generation of SMACA and its application in web services

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are going to propose indexing algorithm of search engine aiming to decrease time and space complexity. Existing indexing algorithms have greater space requirements due to the fact that all the words of the web pages are being stored except the stop words. In this paper, we present a theory on indexing mechanism of a search engine. Time complexity is the time taken by the search engine to retrieve information and space complexity is the space required to store the indices in the hard disk. Decreasing the time complexity will lead to faster retrieval of information and decreasing the space complexity leads to efficient utilisation of space. We have only dealt with textual part of the web pages. Hamming distance concept frames approach to achieve better result in space complexity.