20 Nov 2010 \\ bioinformatics
Sequence tags can be attached to DNA reads of interest to let you track different pools of reads following a second generation sequencing run. The best way to generate tags for these reads is a matter of some debate:
The problem with "just making" the tags is that sequencing errors can inadvertently turn one tag into another if there is, say, an erroneous substitution of a base into the tag portion of the sequence read. Error correcting tags using Hamming distance attempt to counter this effect, but are only robust to substitution errors, which can be problematic. Levenshtein distance tags are robust to insertion, deletion, and substitution error, but it is often hard to find available sets of Levenshtein distance sequence tags.
With all of that in mind, I offer several sets of Levenshtein distance sequence tags. These tags range from 4 to 10 nt and edit distance 3 to 9. The 10nt tags are somewhat slow to create (70 or 80 hours on a multicore machine), so you might as well just use these rather than generate a set, de novo. If you would like to check the tags, to ensure they are of the appropriate distance, you can.
For those interested in the nitty-gritty details, see the code, which is one program within edittag. Now, here are the tags: