What is Sequence Alignment?

As the cost of DNA sequencing continues to drop faster than Moore's Law, there is a growing need for tools that can efficiently analyze large bodies of sequence data. By mid-2013, sequencing a human genome is expected to cost $1000, at which point this technology will enter the realm of routine clinical practice. For example, it is expected that each cancer patient will have their genome and their cancer's genome sequenced.

However, current high-throughput sequencing technologies produce large numbers of short (~100 letter) reads from random locations in the genome. Putting together these reads into a choerent whole is a significant computational challenge, with current pipelines taking thousands of CPU-hours per genome. The first and most expensive step of this process is aligning each read to a known reference genome, so that differences between the patient's genome and the reference genome can be localized.

What is SNAP?

SNAP, the Scalable Nucleotide Alignment Program, is a new aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. SNAP runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives it up to 2x lower error rates than current tools and lets it match large mutations that these tools miss. SNAP can align a human genome in 1.5 hours on a 16-core machine, compared to 1.5 days for BWA, while providing higher accuracy. In addition, the algorithm scales well to upcoming long-read technologies.

SNAP was developed at the AMP Lab at UC Berkeley. A paper about the algorithm is available on arXiv.

What do I need to run SNAP?

SNAP is available for non-commercial use on both Linux and Windows. Visit the downloads page to view the license and get a copy.