Exon Session
While BioBear exposes functions to read various data formats directly, it also makes available an Exon engine to work with data in a more SQL native way.
Creating a Session
Getting started is easy, just import BioBear and "connect" to a local session:
import biobear as bb
session = bb.connect()
From there, you can start working with your data. For example, you can register an S3 bucket, then create external tables backed by objects in that bucket, and finally copy data from those tables to other locations.
session.register_object_store_from_url('s3://wtt-01-dist-prd')
# Create the external table, this could also be on S3
session.sql("""
CREATE EXTERNAL TABLE gene_annotations STORED AS GFF LOCATION 's3://wtt-01-dist-prd/TenflaDSM28944/IMG_Data/Ga0451106_prodigal.gff'
""")
df = session.sql("""
COPY (SELECT seqname, start, "end", score from gene_annotations)
TO 's3://wtt-01-dist-prd/gene_annotations/sample=Ga0455106/gene_annotations.parquet' (FORMAT parquet)
""")
# The copy command returns a DataFrame with the number of rows copied
df.to_polars()
# shape: (1, 1)
# ┌───────┐
# │ count │
# │ --- │
# │ u64 │
# ╞═══════╡
# │ 7998 │
# └───────┘
You can also query that table directly:
result = session.sql("""
SELECT * FROM gene_annotations WHERE score > 50
""")
df = result.to_polars()
df.head()
# shape: (5, 9)
# ┌──────────────┬─────────────────┬──────┬───────┬───┬────────────┬────────┬───────┬───────────────────────────────────┐
# │ seqname ┆ source ┆ type ┆ start ┆ … ┆ score ┆ strand ┆ phase ┆ attributes │
# │ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ str ┆ str ┆ i64 ┆ ┆ f32 ┆ str ┆ str ┆ list[struct[2]] │
# ╞══════════════╪═════════════════╪══════╪═══════╪═══╪════════════╪════════╪═══════╪═══════════════════════════════════╡
# │ Ga0451106_01 ┆ Prodigal v2.6.3 ┆ CDS ┆ 2 ┆ … ┆ 54.5 ┆ - ┆ 0 ┆ [{"ID",["Ga0451106_01_2_238"]}, … │
# │ Ga0451106_01 ┆ Prodigal v2.6.3 ┆ CDS ┆ 228 ┆ … ┆ 114.0 ┆ - ┆ 0 ┆ [{"ID",["Ga0451106_01_228_941"]}… │
# │ Ga0451106_01 ┆ Prodigal v2.6.3 ┆ CDS ┆ 1097 ┆ … ┆ 224.399994 ┆ + ┆ 0 ┆ [{"ID",["Ga0451106_01_1097_2257"… │
# │ Ga0451106_01 ┆ Prodigal v2.6.3 ┆ CDS ┆ 2261 ┆ … ┆ 237.699997 ┆ + ┆ 0 ┆ [{"ID",["Ga0451106_01_2261_3787"… │
# │ Ga0451106_01 ┆ Prodigal v2.6.3 ┆ CDS ┆ 3784 ┆ … ┆ 114.400002 ┆ + ┆ 0 ┆ [{"ID",["Ga0451106_01_3784_4548"… │
# └──────────────┴─────────────────┴──────┴───────┴───┴────────────┴────────┴───────┴───────────────────────────────────┘