Detecting near-duplicate documents using sentence-level features and supervised learning
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Support Vector Machines (SVMs) have been shown to achieve high performance on classification tasks across many domains, and a great deal of work has been dedicated to developing computationally efficient training algorithms for linear SVMs. One approach [1] approximately minimizes risk through use of cutting planes, and is improved by [2], [3]. We build upon this work, presenting a modification to the algorithm developed by Franc and Sonnenburg [2]. We demonstrate empirically that our changes can reduce cutting plane training time by up to 40 percent, and discuss how changes in data sets and parameter settings affect the effectiveness of our method.