What type of page is this?: genre as web descriptor

  • Authors:
  • Mark A. Rosso

  • Affiliations:
  • Meredith College, Raleigh, NC

  • Venue:
  • Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many have suggested the use of genres to ameliorate the problem of web search, e.g. [1,3,4,5,6,7]. A central issue in the implementation of this idea is the choice of genres to be used as web page descriptors. Several studies have explored user terminology for and recognition of several types of digital documents, e.g., various types of office documents [8], personal homepages [2], and pages returned by user web searches [4,6]. This poster reports on a series of three user studies with the purpose of developing a genre "palette" for use in web retrieval. Pages viewed by participants in these studies were limited to the edu domain, as in [5].In the first study, three participants, an information technology professional, an oncology social worker and a computer science professor, in separate sessions, were given a stack of 102 web page printouts, and were asked to separate the pages into piles according to genre. They were also asked to name the genres by writing the names on sticky notes and placing them on the piles. After the piles were complete, participants were asked to provide a short, one or two sentence, description of each genre, and then to describe the page characteristics that led them to place a page in that genre.A list of 49 genre names and definitions was developed from the work of the three participants, keeping the terminology as similar as possible to the original, while combining definitions which were nearly identical in wording. In a second user study, each of ten participants was given this list of genre name/definition pairs, the same stack of 102 printed web pages (arranged in a different random order for each participant), and a data collection form on which he/she recorded a genre for each web page. For each of the 102 web pages, the participant was given the option to either write a number from the list corresponding to a genre/definition pair which best described the page; or to provide his/her own suggestion for a genre name and definition, if none of those in the list seemed adequate. The participants were drawn from a convenience sample of approximately 10 college graduates of various occupations. Given that participants chose genres from a list of 48, many of which were extremely similar in nature, the resulting level of agreement (half or more of the participants agreeing on one genre for a given page in 60% of the instances) is quite acceptable. A set of five principles for creating a genre palette from individuals' sortings was developed. Based on those principles, the original list was trimmed down to 18 genres.The third study was an online experiment in which 257 college, faculty, students, and staff from two schools categorized a new set of 55 pages using the 18 genres. On average, over 70% agreed on the genre of each page. No study of this scale is known to report user recognition of web genres. This user validation is necessary to set upper bounds for machine categorization efforts. Also, because genre is usually considered to be "socially defined", genre studies using researcher-defined a priori categories (e.g., [5]) may not be able to show genres' usefulness for web search.Interestingly, the genres in this palette, although developed independently, are similar to 7 of 8 Internet-wide genres based on user input reported in [7], and similar to 8 of 11 Internet-wide genres as reported in [3]. Based on these observations, one might infer that some substantial amount of genre knowledge exists among users, even from different cultures (in this case, the United States, Germany, and Sweden).