Sequence Alignments
Exon-DuckDB includes tools for aligning sequencings and working with the alignment outputs. This page serves as a guide, but please see the API documentation for more details on specific functions.
Sequence alignment tools are only available on Mac and Linux.
Scores and Strings
Broadly, the alignment tools in Exon-DuckDB can create a score for a given alignment between a query and a target or create a string that represents the alignment.
For example, to given a query sequence, we can sort by how well it aligns to a target sequence. This uses the scoring functionality.
SELECT seq, alignment_score(seq, 'BANANA') AS score
FROM (
SELECT 'BANANA' AS seq
UNION ALL
SELECT 'BANANAS'
UNION ALL
SELECT 'PANAMA'
)
ORDER BY alignment_score(seq, 'BANANA') DESC;
seq | score |
---|---|
BANANA | 0.0 |
PANAMA | -8.0 |
BANANAS | -8.0 |
We can also use the analogous string function to get a CIGAR string for the alignment.
SELECT seq, alignment_string(seq, 'BANANA') AS cigar
FROM (
SELECT 'BANANA' AS seq
UNION ALL
SELECT 'BANANAS'
UNION ALL
SELECT 'PANAMA'
)
ORDER BY alignment_score(seq, 'BANANA') DESC;
seq | cigar |
---|---|
BANANA | 6M |
PANAMA | 1X3M1X1M |
BANANAS | 6M1I |
Extracting Alignments
Quite often we want to align to a target sequence and then extract the aligned subsequence. This can be done with extract_from_cigar
.
For example, let's say we have a guide RNA (will use DNA string) that we're trying to align to a sequence.
SELECT extract_from_cigar(seq, cigar) AS extracted, cigar, guide, seq
FROM (
SELECT alignment_string(seq, guide) AS cigar, seq, guide
FROM (
SELECT 'AAUAAUAAAUUUUUAAAUAUAAUAGAAAAUUGAAGUUCAGUA' AS seq, 'UAAAUAUAA' AS guide
)
);
extracted | cigar | guide | seq |
---|---|---|---|
{'sequence_start': 13, 'sequence_end': 22, 'sequence': UAAAUAUAA} | 13I9M20I | UAAAUAUAA | AAUAAUAAAUUUUUAAAUAUAAUAGAAAAUUGAAGUUCAGUA |
Note that the extract_from_cigar
function returns a struct with the extract sequence and the start and end positions in the original sequence.