Discrete-time signal processing
Discrete-time signal processing
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
WebL - a programming language for the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
A Course in Digital Signal Processing
A Course in Digital Signal Processing
Modern Information Retrieval
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Matching an XML Document against a Set of DTDs
ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Fast Detection of XML Structural Similarity
IEEE Transactions on Knowledge and Data Engineering
Structure-based graph distance measures of high degree of precision
Pattern Recognition
Data & Knowledge Engineering
Tag tree template for Web information and schema extraction
Expert Systems with Applications: An International Journal
A bounded distance metric for comparing tree structure
Information Systems
Hi-index | 0.00 |
In this paper, we propose a classification technique for Web pages, based on the detection of structural similarities among semistructured documents, and devise an architecture exploiting such technique for the purpose of information extraction. The proposal significantly differs from standard methods based on graph-matching algorithms, and is based on the idea of representing the structure of a document as a time series in which each occurrence of a tag corresponds to an impulse. The degree of similarity between documents is then stated by analyzing the frequencies of the corresponding Fourier transform. Experiments on real data show the effectiveness of the proposed technique.