Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
A densitometric approach to web page segmentation
Proceedings of the 17th ACM conference on Information and knowledge management
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Slicepedia: providing customized reuse of open-web resources for adaptive hypermedia
Proceedings of the 23rd ACM conference on Hypertext and social media
Hi-index | 0.00 |
Content slicing addresses the need of adaptive systems to reuse open corpus material by converting it into re-composable information objects. However this conversion is highly dependent upon the ability to correctly fragment pages into structurally sound atomic pieces. A recently suggested approach to fragmentation, which relies on densitometric page representation, claims to achieve high accuracy and time performance. Although it has been well received within the research community, a full evaluation of this approach and identification of strengths and weaknesses across a range of characteristics hasn't been performed. This paper proposes an independent evaluation of the approach with respect to granularity control, accuracy, time performance, content diversity and linguistic dependency. Moreover, this paper also provides a significant contribution to address important weaknesses discovered during the analysis, in order to improve the suitability and impact of the original algorithm within the context of content slicing.