READFAST: high-relevance search-engine for big text

  • Authors:
  • Michael Gubanov;Anna Pyayt

  • Affiliations:
  • Massachusetts Institute of Technology, Cambridge, MA, USA;University of South Florida, Tampa, FL, USA

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Relevance of search-results is a key factor for any search engine. In order to return and rank the Web-pages that are most relevant to the query, contemporary search engines use complex ranking functions that depend on hundreds of features. For example, presence or absence of the query keywords on the page, their proximity, frequencies, HTML markup are just a few to name. Additional features might include fonts, tags, hyperlinks, metadata, and parts of the Web-page description. All this information is used by the search-engine to rank HTML Web pages returned to the user, but is unfortunately absent in free text that has no HTML markup, tags, hyperlinks, and any other metadata, except implicit natural language structure. Here we demonstrate one of the first Big text search engines that leverages hidden structure of the natural language sentences in order to process user queries and return more relevant search-results than a standard keyword-search. It provides a structured index extracted from the text using Natural Language Processing (NLP) that can be used to browse and query free text.