Multi-Backend Guide

GIQL supports multiple database backends, allowing you to run the same genomic queries against different database systems. This guide covers backend selection, configuration, and backend-specific considerations.

Supported Backends

GIQL currently supports the following database backends:

Backend

Status

Best For

DuckDB

Full Support

Analytics, large datasets, in-memory processing

SQLite

Full Support

Lightweight, embedded, portable databases

PostgreSQL

Planned

Production deployments, shared databases

Selecting a Backend

SQLite

SQLite is a lightweight, embedded database suitable for smaller datasets or when portability is important.

# In-memory SQLite
with GIQLEngine(target_dialect="sqlite") as engine:
    pass

# Persistent SQLite database
with GIQLEngine(target_dialect="sqlite", db_path="my_data.db") as engine:
    pass

Advantages:

  • Zero configuration

  • Single-file database

  • Widely compatible

  • Small memory footprint

Best for:

  • Small to medium datasets

  • Portable analysis

  • Embedded applications

  • Simple workflows

Backend Configuration

In-Memory vs Persistent

Both DuckDB and SQLite support in-memory and persistent modes:

# In-memory (data lost when engine closes)
with GIQLEngine(target_dialect="duckdb") as engine:
    engine.load_csv("features", "features.bed")
    # Data exists only during this session

# Persistent (data saved to disk)
with GIQLEngine(target_dialect="duckdb", db_path="analysis.duckdb") as engine:
    engine.load_csv("features", "features.bed")
    # Data persists after engine closes

# Reopen persistent database
with GIQLEngine(target_dialect="duckdb", db_path="analysis.duckdb") as engine:
    # Previous data is available
    cursor = engine.execute("SELECT * FROM features LIMIT 5")

Connection Options

Pass additional connection options to the underlying database:

# DuckDB with custom settings
with GIQLEngine(
    target_dialect="duckdb",
    db_path="analysis.duckdb",
    read_only=False,
) as engine:
    pass

Writing Portable Queries

Query Compatibility

GIQL queries are portable across backends. The same query works on any supported database:

query = """
    SELECT a.*, b.name AS gene
    FROM variants a
    JOIN genes b ON a.interval INTERSECTS b.interval
    WHERE a.quality >= 30
"""

# Works on DuckDB
with GIQLEngine(target_dialect="duckdb") as engine:
    # ... setup ...
    cursor = engine.execute(query)

# Same query works on SQLite
with GIQLEngine(target_dialect="sqlite") as engine:
    # ... setup ...
    cursor = engine.execute(query)

SQL Dialect Differences

While GIQL queries are portable, the generated SQL differs between backends. Use transpile() to see the backend-specific SQL:

query = "SELECT * FROM features WHERE interval INTERSECTS 'chr1:1000-2000'"

# DuckDB SQL
with GIQLEngine(target_dialect="duckdb") as engine:
    engine.register_table_schema("features", {...}, genomic_column="interval")
    print(engine.transpile(query))

# SQLite SQL (may differ slightly)
with GIQLEngine(target_dialect="sqlite") as engine:
    engine.register_table_schema("features", {...}, genomic_column="interval")
    print(engine.transpile(query))

Backend-Specific Features

Some SQL features may only be available on certain backends:

Feature

DuckDB

SQLite

Notes

Window functions

Yes

Yes

Full support

CTEs (WITH clause)

Yes

Yes

Full support

LATERAL joins

Yes

Limited

Used by NEAREST

STRING_AGG

Yes

GROUP_CONCAT

Different function names

Migrating Between Backends

Exporting Data

Export data from one backend for import into another:

# Export from DuckDB
with GIQLEngine(target_dialect="duckdb", db_path="source.duckdb") as engine:
    cursor = engine.execute("SELECT * FROM features")
    import pandas as pd
    df = pd.DataFrame(cursor.fetchall(),
                     columns=[desc[0] for desc in cursor.description])
    df.to_csv("features_export.csv", index=False)

# Import to SQLite
with GIQLEngine(target_dialect="sqlite", db_path="target.db") as engine:
    engine.load_csv("features", "features_export.csv")
    engine.register_table_schema("features", {...}, genomic_column="interval")

Schema Compatibility

Ensure schema definitions work across backends:

# Use portable type names
schema = {
    "chromosome": "VARCHAR",   # Works on all backends
    "start_pos": "BIGINT",     # Maps to appropriate integer type
    "end_pos": "BIGINT",
    "name": "VARCHAR",
    "score": "FLOAT",          # Maps to appropriate float type
}

# Same schema works on both backends
for dialect in ["duckdb", "sqlite"]:
    with GIQLEngine(target_dialect=dialect) as engine:
        engine.load_csv("features", "features.csv")
        engine.register_table_schema("features", schema, genomic_column="interval")

Performance Comparison

Backend Performance Characteristics

Operation

DuckDB

SQLite

Large table scans

Excellent (columnar)

Good

Complex joins

Excellent

Good

Aggregations

Excellent

Good

Small queries

Good

Excellent

Memory usage

Higher

Lower

Startup time

Faster

Fast

Choosing the Right Backend

Choose DuckDB when:

  • Working with large datasets (millions of features)

  • Running complex analytical queries

  • Performing heavy aggregations

  • Memory is not constrained

Choose SQLite when:

  • Working with smaller datasets

  • Need maximum portability

  • Memory is constrained

  • Simple query patterns

Using External Connections

Connecting to Existing Databases

Connect to databases created outside of GIQL:

# Connect to existing DuckDB database
with GIQLEngine(target_dialect="duckdb", db_path="existing.duckdb") as engine:
    # Register schemas for existing tables
    engine.register_table_schema(
        "my_existing_table",
        {
            "chromosome": "VARCHAR",
            "start_pos": "BIGINT",
            "end_pos": "BIGINT",
            "name": "VARCHAR",
        },
        genomic_column="interval",
    )

    # Query existing data with GIQL operators
    cursor = engine.execute("""
        SELECT * FROM my_existing_table
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

Using Transpiled SQL Externally

Generate SQL for use with external database connections:

import duckdb

# Get transpiled SQL from GIQL
with GIQLEngine(target_dialect="duckdb") as engine:
    engine.register_table_schema("features", {...}, genomic_column="interval")
    sql = engine.transpile("""
        SELECT * FROM features
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

# Execute with external connection
conn = duckdb.connect("my_database.duckdb")
result = conn.execute(sql).fetchall()
conn.close()

This is useful when integrating GIQL with existing database workflows or when you need more control over the database connection.