Skip to main content

Getting Started

danger

This extension is no longer maintained, though it's still straightforward to use Exon with DuckDB. See the biobear integration for more information.

Exon-DuckDB is a DuckDB extension that provides a set of functions for working with scientific data.

Installation

Exon-DuckDB is installed through DuckDB commands, meaning that however you use DuckDB, you'll need to install it, then you can install the extension.

Here'll we'll show how to install it via the command line, Python, and R. See the other DuckDB language libraries for other options like Julia, C++, etc. You'll find up-to-date instructions for all languages on the DuckDB website.

Command Line

On the command line, first start the DuckDB shell:

duckdb -unsigned

Once there, add the repository, and install the extension:

D SET custom_extension_repository='dbe.wheretrue.com/exon/latest';
D INSTALL exon;
D LOAD exon;

You should only need to install the extension once, but you'll need to load it each time you start DuckDB.

Assuming that all went well, you should be able to run the following command:

SELECT gc_content('ATCG');

Python

For python, you'll follow roughly the same steps, except through Python.

import duckdb

con = duckdb.connect(
config={
"allow_unsigned_extensions": True,
}
)

con.execute("SET custom_extension_repository='dbe.wheretrue.com/exon/latest'")
con.install_extension("exon")
con.load_extension("exon")

Similarly, you should only need to install the extension once, but you'll need to load it each time you start DuckDB. And if the loading went well, you should be able to run the following command:

# Requires pandas be installed
df = con.execute("SELECT gc_content('ATCG')").df()

R

And finally, for R, you'll follow roughly the same steps, except through R.

library(DBI)
library(duckdb)

con <- dbConnect(
duckdb::duckdb(config = list("allow_unsigned_extensions" = "true")),
dbdir = ":memory:"
)

query <- "SET custom_extension_repository='dbe.wheretrue.com/exon/latest';"
dbExecute(con, query)

query <- "INSTALL exon;"
dbExecute(con, query)

query <- "LOAD 'exon';"
dbExecute(con, query)

res <- dbGetQuery(con, "SELECT gc_content('ATCG')")
print(res)

Usage

Once installed, you can use the provided table and/or scalar functions in your queries. For example:

SELECT *
FROM read_fasta('path/to/file.fasta')
WHERE sequence LIKE 'M%'
LIMIT 5

You can see more information below.