Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
ACM SIGKDD Explorations Newsletter
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Mining product reputations on the Web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Armadillo: harvesting information for the semantic web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining and summarizing customer reviews
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic information extraction from large websites
Journal of the ACM (JACM)
A Probabilistic Approach for Adapting Information Extraction Wrappers and Discovering New Attributes
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Opinion observer: analyzing and comparing opinions on the Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
Mining interesting knowledge from weblogs: a survey
Data & Knowledge Engineering
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Price prediction and insurance for online auctions
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Clustering web pages based on their structure
Data & Knowledge Engineering - Special issue: WIDM 2003
Unsupervised named-entity extraction from the web: an experimental study
Artificial Intelligence
Hot Item Mining and Summarization from Multiple Auction Web Sites
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Information extraction from structured documents using k-testable tree automaton inference
Data & Knowledge Engineering
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Combining Information Extraction Systems Using Voting and Stacked Generalization
The Journal of Machine Learning Research
Collective information extraction with relational Markov networks
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Exploiting structural similarity for effective Web information extraction
Data & Knowledge Engineering
Unsupervised learning of field segmentation models for information extraction
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Extracting product features and opinions from reviews
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Sampling, information extraction and summarisation of hidden web databases
Data & Knowledge Engineering - Special issue: WIDM 2004
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Semi-supervised learning of attribute-value pairs from product descriptions
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Loopy belief propagation for approximate inference: an empirical study
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Extracting and summarizing hot item features across different auction web sites
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Factor graphs and the sum-product algorithm
IEEE Transactions on Information Theory
Acquisition of instance attributes via labeled and related instances
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Instance sense induction from attribute sets
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Extracting hot spots of topics from time-stamped documents
Data & Knowledge Engineering
News information extraction based on adaptive weighting using unsupervised Bayesian algorithm
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Attribute retrieval from relational web tables
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Towards a framework for attribute retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
The role of query sessions in extracting instance attributes from web search queries
ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Data extraction from web pages based on structural-semantic entropy
Proceedings of the 21st international conference companion on World Wide Web
Tackling incompleteness in information extraction --- a complementarity approach
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Aggregated search: A new information retrieval paradigm
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We develop an unsupervised learning framework which can jointly extract information and conduct feature mining from a set of Web pages across different sites. One characteristic of our model is that it allows tight interactions between the tasks of information extraction and feature mining. Decisions for both tasks can be made in a coherent manner leading to solutions which satisfy both tasks and eliminate potential conflicts at the same time. Our approach is based on an undirected graphical model which can model the interdependence between the text fragments within the same Web page, as well as text fragments in different Web pages. Web pages across different sites are considered simultaneously and hence information from different sources can be effectively leveraged. An approximate learning algorithm is developed to conduct inference over the graphical model to tackle the information extraction and feature mining tasks. We demonstrate the efficacy of our framework by applying it to two applications, namely, important product feature mining from vendor sites, and hot item feature mining from auction sites. Extensive experiments on real-world data have been conducted to demonstrate the effectiveness of our framework.