Skip to main content

Sequence Alignments

Exon-DuckDB includes tools for aligning sequencings and working with the alignment outputs. This page serves as a guide, but please see the API documentation for more details on specific functions.

note

Sequence alignment tools are only available on Mac and Linux.

Scores and Strings

Broadly, the alignment tools in Exon-DuckDB can create a score for a given alignment between a query and a target or create a string that represents the alignment.

For example, to given a query sequence, we can sort by how well it aligns to a target sequence. This uses the scoring functionality.

SELECT seq, alignment_score(seq, 'BANANA') AS score
FROM (
SELECT 'BANANA' AS seq
UNION ALL
SELECT 'BANANAS'
UNION ALL
SELECT 'PANAMA'
)
ORDER BY alignment_score(seq, 'BANANA') DESC;
seqscore
BANANA0.0
PANAMA-8.0
BANANAS-8.0

We can also use the analogous string function to get a CIGAR string for the alignment.

SELECT seq, alignment_string(seq, 'BANANA') AS cigar
FROM (
SELECT 'BANANA' AS seq
UNION ALL
SELECT 'BANANAS'
UNION ALL
SELECT 'PANAMA'
)
ORDER BY alignment_score(seq, 'BANANA') DESC;
seqcigar
BANANA6M
PANAMA1X3M1X1M
BANANAS6M1I

Extracting Alignments

Quite often we want to align to a target sequence and then extract the aligned subsequence. This can be done with extract_from_cigar.

For example, let's say we have a guide RNA (will use DNA string) that we're trying to align to a sequence.

SELECT extract_from_cigar(seq, cigar) AS extracted, cigar, guide, seq

FROM (
SELECT alignment_string(seq, guide) AS cigar, seq, guide

FROM (
SELECT 'AAUAAUAAAUUUUUAAAUAUAAUAGAAAAUUGAAGUUCAGUA' AS seq, 'UAAAUAUAA' AS guide
)
);
extractedcigarguideseq
{'sequence_start': 13, 'sequence_end': 22, 'sequence': UAAAUAUAA}13I9M20IUAAAUAUAAAAUAAUAAAUUUUUAAAUAUAAUAGAAAAUUGAAGUUCAGUA

Note that the extract_from_cigar function returns a struct with the extract sequence and the start and end positions in the original sequence.