OCELOT: a system for summarizing Web pages

Authors:
Adam L. Berger;Vibhu O. Mittal
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;Just Research, 4616 Henry Street, Pittsburgh, PA
Venue:
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2000

Citing 8
Cited 58

Statistical methods for speech recognition

Statistical methods for speech recognition
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Summarizing text documents: sentence selection and evaluation metrics

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Problems in automatic abstracting

Communications of the ACM
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Mining the Web for bilingual text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
The Candide system for machine translation

HLT '94 Proceedings of the workshop on Human Language Technology

Seeing the whole in parts: text summarization for web browsing on handheld devices

Proceedings of the 10th international conference on World Wide Web
Temporal summaries of new topics

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Finding topic words for hierarchical summarization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Searcher performance in question answering

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient web browsing on handheld devices using page and form summarization

ACM Transactions on Information Systems (TOIS)
Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic summarization of open-domain multiparty dialogues in diverse genres

Computational Linguistics - Summarization
Optimal Mixture Models in IR

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Enhanced web document summarization using hyperlinks

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Web Page Summarization for Handheld Devices: A Natural Language Approach

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
A framework for abstracting data sources having heterogeneous representation formats

Data & Knowledge Engineering
Web-page classification through summarization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Web page summarization using dynamic content

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
World wide web site summarization

Web Intelligence and Agent Systems
Analysis of titles and readers: for title generation centered on the readers

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Discovering "title-like" terms

Information Processing and Management: an International Journal
Gist summaries for visually impaired surfers

Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility
Narrative text classification for automatic key phrase extraction in web document corpora

Proceedings of the 7th annual ACM international workshop on Web information and data management
Combining linguistic and machine learning techniques for email summarization

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
GIST-IT: summarizing email using linguistic knowledge and machine learning

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
WebInSight:: making web images accessible

Proceedings of the 8th international ACM SIGACCESS conference on Computers and accessibility
A system for query-specific document summarization

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A support system for revising titles to stimulate the lay reader's interest in technical achievements

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
FASIL email summarisation system

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Temporal multi-page summarization

Web Intelligence and Agent Systems
Noise reduction through summarization for Web-page classification

Information Processing and Management: an International Journal
Automatic summarising: The state of the art

Information Processing and Management: an International Journal
Just-in-time contextual advertising

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Learning query-biased web page summarization

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
PeRSSonal's core functionality evaluation: Enhancing text labeling through personalized summaries

Data & Knowledge Engineering
Improving relevance judgment of web search results with image excerpts

Proceedings of the 17th international conference on World Wide Web
Mobile web: web manipulation for small displays using multi-level hierarchy page segmentation

Mobility '07 Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology
Towards mining past content of Web pages

The New Review of Hypermedia and Multimedia - Web Archiving
A Technique for Summarizing Web Reviews

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Keyphrase extraction for labeling a website topic hierarchy

Proceedings of the 11th International Conference on Electronic Commerce
CollabRank: towards a collaborative approach to single-document keyphrase extraction

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Single document keyphrase extraction using neighborhood knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
MagicCube: choosing the best snippet for each aspect of an entity

Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

ACM Transactions on Information Systems (TOIS)
Query-topic focused web pages summarization

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Summarizing web sites automatically

AI'03 Proceedings of the 16th Canadian society for computational studies of intelligence conference on Advances in artificial intelligence
Using landing pages for sponsored search ad selection

Proceedings of the 19th international conference on World wide web
Enriching the contents of enterprises' wiki systems with web information

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
A hierarchical model of web summaries

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Web Page Summarization for Just-in-Time Contextual Advertising

ACM Transactions on Intelligent Systems and Technology (TIST)
Keyword extraction using support vector machine

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Features combination for extracting gene functions from MEDLINE

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Comparing topiary-style approaches to headline generation

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Why read if you can skim: towards enabling faster screen reading

Proceedings of the International Cross-Disciplinary Conference on Web Accessibility
PostRank: a new algorithm for incremental finding of persian blog representative words

Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Accessible skimming: faster screen reading of web pages

Proceedings of the 25th annual ACM symposium on User interface software and technology
Enhancing biomedical concept extraction using semantic relationship weights

International Journal of Data Mining and Bioinformatics
Improving the accessibility of digital documents for blind users: contributions of the textual architecture model

UAHCI'13 Proceedings of the 7th international conference on Universal Access in Human-Computer Interaction: applications and services for quality of life - Volume Part III
Coping tactics employed by visually disabled users on the web

International Journal of Human-Computer Studies
Effective named entity recognition for idiosyncratic web collections

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has focused on the task of news articles, web pages are quite different in both structure and content. Instead of coherent text with a well-defined discourse structure, they are more often likely to be a chaotic jumble of phrases, links, graphics and formatting commands. Such text provides little foothold for extractive summarization techniques, which attempt to generate a summary of a document by excerpting a contiguous, coherent span of text from it. This paper builds upon recent work in non-extractive summarization, producing the gist of a web page by “translating” it into a more concise representation rather than attempting to extract a text span verbatim. OCELOT uses probabilistic models to guide it in selecting and ordering words into a gist. This paper describes a technique for learning these models automatically from a collection of human-summarized web pages.