Automatic Extraction of Useful Facet Hierarchies from Text Databases

Authors:
Wisam Dakka;Panagiotis G. Ipeirotis
Affiliations:
Computer Science Department, Columbia University, 1214 Amsterdam Avenue, New York, NY 10027, USA. wisam@cs.columbia.edu;Department of Information, Operations, and Management Sciences, New York University, 44 West 4th Street, New York, NY 10012, USA. panos@nyu.edu
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 25

Semantically driven snippet selection for supporting focused web searches

Data & Knowledge Engineering
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Reinventing the Web Browser for the Semantic Web

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Beyond hyperlinks: organizing information footprints in search logs to support effective browsing

Proceedings of the 18th ACM conference on Information and knowledge management
Probabilistic models of ranking novel documents for faceted topic retrieval

Proceedings of the 18th ACM conference on Information and knowledge management
STC+ and NM-STC: Two Novel Online Results Clustering Methods for Web Searching

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
NLP support for faceted navigation in scholarly collections

NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia

Proceedings of the 19th international conference on World wide web
Exploratory web searching with dynamic taxonomies and results clustering

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Semantic annotation based exploratory search for information analysts

Information Processing and Management: an International Journal
Exploring repositories of scientific workflows

Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science
WikiAnalytics: disambiguation of keyword search results on highly heterogeneous structured data

Procceedings of the 13th International Workshop on the Web and Databases
Eddi: interactive topic-based browsing of social status streams

UIST '10 Proceedings of the 23nd annual ACM symposium on User interface software and technology
ImageSieve: exploratory search of museum archives with named entity-based faceted browsing

Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
Facet discovery for structured web search: a query-log mining approach

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Finding dimensions for queries

Proceedings of the 20th ACM international conference on Information and knowledge management
Identifying content for planned events across social media sites

Proceedings of the fifth ACM international conference on Web search and data mining
Evaluation and user preference study on spatial diversity

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Multi-select faceted navigation based on minimum description length principle

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
A distributed recommender system architecture

International Journal of Web Engineering and Technology
A survey of faceted search

Journal of Web Engineering
DFT-extractor: a system to extract domain-specific faceted taxonomies from wikipedia

Proceedings of the 22nd international conference on World Wide Web companion
Automated faceted reporting for web analytics

Proceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoning
A framework for automated construction of resource space based on background knowledge

Future Generation Computer Systems
NLP-based faceted search: Experience in the development of a science and technology search engine

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Databases of text and text-annotated data constitute a significant fraction of the information available in electronic form. Searching and browsing are the typical ways that users locate items of interest in such databases. Faceted interfaces represent a new powerful paradigm that proved to be a successful complement to searching. Thus far the identification of the facets was either a manual procedure or relied on apriori knowledge of the facets that can potentially appear in the underlying collection. In this paper we present an unsupervised technique for automatic extraction of facets useful for browsing text databases. In particular we observe through a pilot study that facet terms rarely appear in text documents showing that we need external resources to identify useful facet terms. For this we first identify important phrases in each document. Then we expand each phrase with "context" phrases using external resources such as WordNet and Wikipedia causing facet terms to appear in the expanded database. Finally we compare the term distributions in the original database and the expanded database to identify the terms that can be used to construct browsing facets. Our extensive user studies using the Amazon Mechanical Turk service show that our techniques produce facets with high precision and recall that are superior to existing approaches and help users locate interesting items faster.