User interface directions for the Web
Communications of the ACM
ACM SIGKDD Explorations Newsletter
Concept-based knowledge discovery in texts extracted from the Web
ACM SIGKDD Explorations Newsletter
A vector space model for automatic indexing
Communications of the ACM
Seeing the whole in parts: text summarization for web browsing on handheld devices
Proceedings of the 10th international conference on World Wide Web
Analysis of navigation behaviour in web sites integrating multiple information systems
The VLDB Journal — The International Journal on Very Large Data Bases
Data mining for hypertext: a tutorial survey
ACM SIGKDD Explorations Newsletter
A Methodology to Find Web Site Keywords
EEE '04 Proceedings of the 2004 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'04)
Web mining in soft computing framework: relevance, state of the art and future directions
IEEE Transactions on Neural Networks
Semantic analysis of web site audience
Proceedings of the 2006 ACM symposium on Applied computing
A hybrid system for concept-based web usage mining
International Journal of Hybrid Intelligent Systems
Web site improvements based on representative pages identification
AI'05 Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence
Improving web sites with web usage mining, web content mining, and semantic analysis
SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
Conceptual classification to improve a web site content
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Hi-index | 0.00 |
We introduce a new method to improve web site text content by identifying the most relevant free text in the web pages. In order to understand the variations in web page text, we collect pages during a period. The page text content is then transformed into a feature vector and is used as input of a clustering algorithm (SOFM), which groups the vectors by common text content. In each cluster, a centroid and its neighbor vectors are extracted. Then using a reverse clustering analysis, the pages represented by each vector are reviewed in order to find the similar. Furthermore, the proposed method was tested in a real web site, proving the effectiveness of this approach.