Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Automated discovery of search interfaces on the web
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A MFoM learning approach to robust multiclass multi-label text categorization
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Organizing structured web sources by query schemas: a clustering approach
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Combining classifiers to identify online databases
Proceedings of the 16th international conference on World Wide Web
Semi-supervised learning of attribute-value pairs from product descriptions
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Generation of Specifications Forms through Statistical Learning for a Universal Services Marketplace
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Hi-index | 0.00 |
There is a growing number of service providers that a consumer can interact with over the web to learn their service terms. The service terms, such as price and time to completion of the service, depend on the consumer's particular specifications. For instance, a printing services provider would need from its customers specifications such as the size of paper, type of ink, proofing and perforation. In a few sectors, there exist marketplace sites that provide consumers with specifications forms, which the consumer can fill out to learn the service terms of multiple service providers. Unfortunately, there are only a few such marketplace sites, and they cover a few sectors. At HP Labs, we are working towards building a universal marketplace site, i.e., a marketplace site that covers thousands of sectors and hundreds of providers per sector. One issue in this domain is the automated discovery/retrieval of the specifications for each sector. We address it through extracting and analyzing content from the websites of the service providers listed in business directories. The challenge is that each service provider is often listed under multiple service categories in a business directory, making it infeasible to utilize standard supervised learning techniques. We address this challenge through employing a multilabel statistical clustering approach within an expectation-maximization framework. We implement our solution to retrieve specifications for 3000 sectors, representing more than 300,000 service providers. We discuss our results within the context of the services needed to design a marketing campaign for a small business.