Fundamentals of speech recognition
Fundamentals of speech recognition
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Named Entity recognition without gazetteers
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Probabilistic text structuring: experiments with sentence ordering
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Efficiently linking text documents with relevant structured information
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Reference metadata extraction using a hierarchical knowledge representation framework
Decision Support Systems
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A genetic algorithm for segmentation and information retrieval of SEC regulatory filings
dg.o '08 Proceedings of the 2008 international conference on Digital government research
Helping satisfy multiple objectives during a service desk conversation
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Boosting text segmentation via progressive classification
Knowledge and Information Systems
Data & Knowledge Engineering
Foundations and Trends in Databases
Address standardization with latent semantic association
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Constraint-based entity matching
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Semantic annotation of unstructured and ungrammatical text
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
Generalized expectation criteria for bootstrapping extractors using record-text alignment
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
An incremental clustering scheme for data de-duplication
Data Mining and Knowledge Discovery
Unsupervised query segmentation using click data: preliminary results
Proceedings of the 19th international conference on World wide web
ONDUX: on-demand unsupervised learning for information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Unsupervised strategies for information extraction by text segmentation
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Minimally-supervised extraction of entities from text advertisements
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Learning 5000 relational extractors
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Exploiting content redundancy for web information extraction
Proceedings of the VLDB Endowment
Collective extraction from heterogeneous web lists
Proceedings of the fourth ACM international conference on Web search and data mining
Joint unsupervised structure discovery and information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Web information extraction using markov logic networks
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised multi-task learning of structured prediction models for web information extraction
Proceedings of the 20th ACM international conference on Information and knowledge management
Exploring the corporate ecosystem with a semi-supervised entity graph
Proceedings of the 20th ACM international conference on Information and knowledge management
Extracting and summarizing hot item features across different auction web sites
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Self-supervised learning approach for extracting citation information on the web
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
P-top-k queries in a probabilistic framework from information extraction models
Computers & Mathematics with Applications
Learning to predict from textual data
Journal of Artificial Intelligence Research
ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data warehouses and relational databases to develop an automatic segmentation system. Thus, we overcome limitations of existing supervised text segmentation approaches, which require comprehensive manually labeled training data. Our segmentation system is robust, accurate, and efficient, and requires no additional manual effort. Thorough evaluation on real datasets demonstrates the robustness and accuracy of our system, with segmentation accuracy exceeding state of the art supervised approaches.