The SGML handbook
Efficient crawling through URL ordering
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Modern Information Retrieval
Introduction to Automata Theory, Languages and Computability
Introduction to Automata Theory, Languages and Computability
IE5 Dynamic HTML Programmer's Reference
IE5 Dynamic HTML Programmer's Reference
DIGIMIMIR: A Tool for Rapid Situation Analysis of Helpdesk and Support Email
LISA '04 Proceedings of the 18th USENIX conference on System administration
Retrieving answers from frequently asked questions pages on the web
Proceedings of the 14th ACM international conference on Information and knowledge management
Finding similar questions in large question and answer archives
Proceedings of the 14th ACM international conference on Information and knowledge management
Word selection for EBMT based on monolingual similarity and translation confidence
HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Recommending questions using the mdl-based tree cut model
Proceedings of the 17th international conference on World Wide Web
A syntactic tree matching approach to finding similar questions in community-based qa services
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Question pre-processing in a QA system on Internet discussion groups
SumQA '06 Proceedings of the Workshop on Task-Focused Summarization and Question Answering
Confucius and its intelligent disciples: integrating social with search
Proceedings of the VLDB Endowment
Question answering system with recommendation using fuzzy relational product operator
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
This paper presents an approach to FAQ mining via a list detection algorithm. List detection is very important for data collection since list has been widely used for representing data and information on the Web. By analyzing the rendering of FAQs on the Web, we found a fact that all FAQs are always fully/partially represented in a list-like form. There are two ways to author a list on the Web. One is to use some specific tags, e.g. tag for HTML. The lists authored in this way can be easily detected by parsing those special tags. Another way uses other tags instead of the special tags. Unfortunately, many lists are authored in the second way. To detect lists, therefore, we present an algorithm, which is independent of Web languages. By combining the algorithm with some domain knowledge, we detect and collect FAQs from the Web. The mining task achieved a performance of 72.54% recall and 80.16% precision rates.