Mining employment market via text block detection and adaptive cross-domain information extraction

Authors:
Tak-Lam Wong;Wai Lam;Bo Chen
Affiliations:
The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong;The Chinese University of Hong Kong, Hong Kong, Hong Kong
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 13
Cited 5

Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Bottom-up relational learning of pattern matching rules for information extraction

The Journal of Machine Learning Research
Mining data records in Web pages

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Sources of Success for Boosted Wrapper Induction

The Journal of Machine Learning Research
Block-based web search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Adaptive information extraction

ACM Computing Surveys (CSUR)
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data

The Journal of Machine Learning Research
Self-taught learning: transfer learning from unlabeled data

Proceedings of the 24th international conference on Machine learning
Extracting discriminative concepts for domain adaptation in text mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Domain adaptation with structural correspondence learning

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Transfer learning via dimensionality reduction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Learning with scope, with application to information extraction and classification

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

From one tree to a forest: a unified solution for structured web data extraction

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
News information extraction based on adaptive weighting using unsupervised Bayesian algorithm

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Domain driven data mining in human resource management: A review of current research

Expert Systems with Applications: An International Journal
Unsupervised approach to generate informative structured snippets for job search engines

Proceedings of the 22nd international conference on World Wide Web companion
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have developed an approach for analyzing online job advertisements in different domains (industries) from different regions worldwide. Our approach is able to extract precise information from the text content supporting useful employment market analysis locally and globally. A major component in our approach is an information extraction framework which is composed of two challenging tasks. The first task is to detect unformatted text blocks automatically based on an unsupervised learning model. Identifying these useful text blocks through this learning model allows the generation of highly effective features for the next task which is text fragment extraction learning. The task of text fragment extraction learning is formulated as a domain adaptation model for text fragment classification. One advantage of our approach is that it can easily adapt to a large number of online job advertisements in different and new domains. Extensive experiments have been conducted to demonstrate the effectiveness and flexibility of our approach.