The string edit distance matching problem with moves
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
The ExemplarBreakpointDistance for Non-trivial Genomes Cannot Be Approximated
WALCOM '09 Proceedings of the 3rd International Workshop on Algorithms and Computation
The zero exemplar distance problem
RECOMB-CG'10 Proceedings of the 2010 international conference on Comparative genomics
Scaffold filling under the breakpoint distance
RECOMB-CG'10 Proceedings of the 2010 international conference on Comparative genomics
Filling scaffolds with gene repetitions: maximizing the number of adjacencies
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
The approximability of the exemplar breakpoint distance problem
AAIM'06 Proceedings of the Second international conference on Algorithmic Aspects in Information and Management
Minimum common string partition problem: hardness and approximations
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Scaffold Filling under the Breakpoint and Related Distances
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Non-breaking similarity of genomes with gene repetitions
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Scaffold filling is a new combinatorial optimization problem in genome sequencing. The one-sided scaffold filling problem can be described as given an incomplete genome $(I)$ and a complete (reference) genome $(G)$, fill the missing genes into $(I)$ such that the number of common (string) adjacencies between the resulting genome $(I^{\prime })$ and $(G)$ is maximized. This problem is NP-complete for genome with duplicated genes and the best known approximation factor is 1.33, which uses a greedy strategy. In this paper, we prove a better lower bound of the optimal solution, and devise a new algorithm by exploiting the maximum matching method and a local improvement technique, which improves the approximation factor to 1.25. For genome with gene repetitions, this is the only known NP-complete problem which admits an approximation with a small constant factor (less than 1.5).