Learning ontologies from natural language texts
International Journal of Human-Computer Studies
Metadata Extraction from Chinese Research Papers Based on Conditional Random Fields
FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01
Advanced tools for resolving complex issues in networking
International Journal of Wireless and Mobile Computing
Study on collaborative filtering recommendation algorithm based on web user clustering
International Journal of Wireless and Mobile Computing
An evolvable cellular automata based data encryption algorithm
International Journal of Wireless and Mobile Computing
Hi-index | 0.00 |
Aiming at the problem of requiring a lot of human intervention in the process of unstructured data extraction from expert page based on traditional extraction methods, this paper proposes a method which detects data template automatically based on similarities and differences between HTML tags and strings, uses the lattice theory to find the location of the data grid region storing unstructured expert data, thus accesses to unstructured expert data. Firstly, with the help of the classifier on Chinese Expert Entity Homepages, a lot of expert pages are acquired by expert web crawler. Secondly, divide the expert pages into two types, list type and document type, then extract respectively the unstructured data from the two different types. Lastly, the extraction experiments are conducted on different types of web pages by improving open source code of Roadrunner. Experimental results show that, in the case of unsupervised, this method performs effectively on extraction of unstructured web data from Chinese expert pages.