Algorithms for string searching
ACM SIGIR Forum
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
IEPAD: information extraction based on pattern discovery
Proceedings of the 10th international conference on World Wide Web
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
Liveclassifier: creating hierarchical text classifiers through web corpora
Proceedings of the 13th international conference on World Wide Web
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
Tree-Structured Template Generation for Web Pages
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
PEWeb: Product Extraction from the Web Based on Entropy Estimation
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Detecting and Partitioning Data Objects in Complex Web Pages
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Editorial: special issue on web content mining
ACM SIGKDD Explorations Newsletter
Extracting relational data from HTML repositories
ACM SIGKDD Explorations Newsletter
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Opinion observer: analyzing and comparing opinions on the Web
WWW '05 Proceedings of the 14th international conference on World Wide Web
Object-level ranking: bringing order to Web objects
WWW '05 Proceedings of the 14th international conference on World Wide Web
Title extraction from bodies of HTML documents and its application to web page retrieval
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
IEEE Transactions on Knowledge and Data Engineering
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
2D Conditional Random Fields for Web information extraction
ICML '05 Proceedings of the 22nd international conference on Machine learning
Automatically Mining Result Records from Search Engine Response Pages
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Learning Object Models from Semistructured Web Documents
IEEE Transactions on Knowledge and Data Engineering
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Automatic extraction of dynamic record sections from search engine result pages
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Structured Data Extraction from the Web Based on Partial Tree Alignment
IEEE Transactions on Knowledge and Data Engineering
Adapting Web information extraction knowledge via mining site-invariant and site-dependent features
ACM Transactions on Internet Technology (TOIT)
Web page title extraction and its application
Information Processing and Management: an International Journal
Homepage live: automatic block tracing for web personalization
Proceedings of the 16th international conference on World Wide Web
Proceedings of the 16th international conference on World Wide Web
Adaptive record extraction from web pages
Proceedings of the 16th international conference on World Wide Web
MySearchView: a customized metasearch engine generator
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
FLUX-CIM: flexible unsupervised extraction of citation metadata
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Extracting Web Data Using Instance-Based Learning
World Wide Web
Extraction of flat and nested data records from web pages
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Mining templates from search result records of search engines
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint optimization of wrapper generation and template detection
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Corroborate and learn facts from the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 2008 ACM symposium on Applied computing
Towards a global schema for web entities
Proceedings of the 17th international conference on World Wide Web
Pictor: an interactive system for importing data from a website
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting data records in semi-structured web sites based on text token clustering
Integrated Computer-Aided Engineering
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Bootstrapping Information Extraction from Semi-structured Web Pages
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Experiences in crawling deep web in the context of local search
Proceedings of the 2nd international workshop on Geographic information retrieval
Data & Knowledge Engineering
Uncertainty Issues and Algorithms in Automating Process Connecting Web and User
Uncertainty Reasoning for the Semantic Web I
Spatial Relation Based Object Extraction from the World Wide Web
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Information extraction from syllabi for academic e-Advising
Expert Systems with Applications: An International Journal
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
Business Specific Online Information Extraction from German Websites
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Juicer: Scalable Extraction for Thread Meta-information of Web Forum
PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
Mining employment market via text block detection and adaptive cross-domain information extraction
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Automatic wrapper generation using tree matching and partial tree alignment
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Deriving image-text document surrogates to optimize cognition
Proceedings of the 9th ACM symposium on Document engineering
Information Extraction from Web Pages
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Entropy-Based Visual Tree Evaluation on Block Extraction
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Efficient record-level wrapper induction
Proceedings of the 18th ACM conference on Information and knowledge management
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
RENS --- Enabling a Robot to Identify a Person
ICIRA '09 Proceedings of the 2nd International Conference on Intelligent Robotics and Applications
Web data extracion using visual features
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
BIS'07 Proceedings of the 10th international conference on Business information systems
Using structured tokens to identify webpages for data extraction
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Automatic extraction of clickable structured web contents for name entity queries
Proceedings of the 19th international conference on World wide web
WMS-extracting multiple sections data records from search engine results pages
Proceedings of the 2010 ACM Symposium on Applied Computing
Automatic data record detection in web pages
KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Mining subtrees with frequent occurrence of similar subtrees
DS'07 Proceedings of the 10th international conference on Discovery science
Using clustering for web information extraction
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
Blog post and comment extraction using information quantity of web format
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
An effective method supporting data extraction and schema recognition on deep web
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Web data extraction system based on label library
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Master defect record retrieval using network-based feature association
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Automatic extraction of web data records containing user-generated content
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A unified approach for extracting multiple news attributes from news pages
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Automatically extracting web data records
AMT'10 Proceedings of the 6th international conference on Active media technology
Normalizing web product attributes and discovering domain ontology with minimal effort
Proceedings of the fourth ACM international conference on Web search and data mining
A novel method for bilingual web page acquisition from search engine web records
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Growing parallel paths for entity-page discovery
Proceedings of the 20th international conference companion on World wide web
Unexpected results in automatic list extraction on the web
ACM SIGKDD Explorations Newsletter
A data mining method for accurate employment search on the web
COMATIA'10 Proceedings of the 2010 international conference on Communication and management in technological innovation and academic globalization
Foundations and Trends in Information Retrieval
A framework for automatic annotation of web pages using the Google rich snippets vocabulary
Proceedings of the 2011 ACM Symposium on Applied Computing
An approach to assess the quality of web pages in the deep web
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
From one tree to a forest: a unified solution for structured web data extraction
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Extracting general lists from web documents: a hybrid approach
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Information extraction from semi-structured resources: a two-phase finite state transducers approach
CIAA'11 Proceedings of the 16th international conference on Implementation and application of automata
Towards a spatial instance learning method for deep web pages
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
An indent shape based approach for web lists mining
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Concluding pattern of web page based on string pattern matching
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
News information extraction based on adaptive weighting using unsupervised Bayesian algorithm
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
SILA: a spatial instance learning approach for deep webpages
Proceedings of the 20th ACM international conference on Information and knowledge management
A simhash-based scheme for locating product information from the web
Proceedings of the Second Symposium on Information and Communication Technology
Extracting data records from query result pages based on visual features
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Hybrid method for automated news content extraction from the web
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Semantically integrating portlets in portals through annotation
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Extracting and summarizing hot item features across different auction web sites
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
NET – a system for extracting web data from flat and nested data records
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Image description mining and hierarchical clustering on data records using HR-Tree
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
RecipeCrawler: collecting recipe data from WWW incrementally
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
CCWrapper: adaptive predefined schema guided web extraction
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Automatic wrapper generation for metasearch using ordered tree structured patterns
AI'04 Proceedings of the 17th Australian joint conference on Advances in Artificial Intelligence
Structure detection system from web documents through backpropagation network learning
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
Automatic information extraction from the web: case study with recipes
Proceedings of the 50th Annual Southeast Regional Conference
Learn-as-you-go: new ways of cloud-based micro-learning for the mobile web
ICWL'11 Proceedings of the 10th international conference on Advances in Web-Based Learning
Data extraction for search engine using safe matching
AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Extracting multiple news attributes based on visual features
Journal of Intelligent Information Systems
An automatic web-oriented multimedia extraction and multiresolution visualization scheme
ACA'12 Proceedings of the 11th international conference on Applications of Electrical and Computer Engineering
Automated internal web page clustering for improved data extraction
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Automatically extracting user reviews from forum sites
Computers & Mathematics with Applications
A system for extracting top-K lists from the web
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Extracting data records from web using suffix tree
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Multiple sections extraction using visual cue
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning
Proceedings of the sixth ACM international conference on Web search and data mining
Exploring structure and content on the web: extraction and integration of the semi-structured web
Proceedings of the sixth ACM international conference on Web search and data mining
Fast algorithms for finding a minimum repetition representation of strings and trees
Discrete Applied Mathematics
Cluster-based page segmentation-a fast and precise method for web page pre-processing
Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
SearchResultFinder: federated search made easy
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Visually extracting data records from the deep web
Proceedings of the 22nd international conference on World Wide Web companion
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
Scalable and noise tolerant web knowledge extraction for search task simplification
Decision Support Systems
Hi-index | 0.00 |
A large amount of information on the Web is contained in regularly structured objects, which we call data records. Such data records are important because they often present the essential information of their host pages, e.g., lists of products or services. It is useful to mine such data records in order to extract information from them to provide value-added services. Existing automatic techniques are not satisfactory because of their poor accuracies. In this paper, we propose a more effective technique to perform the task. The technique is based on two observations about data records on the Web and a string matching algorithm. The proposed technique is able to mine both contiguous and non-contiguous data records. Our experimental results show that the proposed technique outperforms existing techniques substantially.