Transpilation Guide

GIQL works by transpiling genomic queries into standard SQL. This guide explains how transpilation works, how to debug query generation, and how to use transpiled SQL with external tools.

How Transpilation Works

The Transpilation Process

When you write a GIQL query:

SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'

GIQL performs these steps:

  1. Parse: Parse the SQL to identify GIQL-specific operators

  2. Expand: Replace genomic operators with standard SQL predicates

  3. Generate: Produce SQL for the target database dialect

The result is standard SQL:

SELECT * FROM variants
WHERE chromosome = 'chr1' AND start_pos < 2000 AND end_pos > 1000

Operator Expansion

Each GIQL operator expands to specific SQL patterns:

INTERSECTS expands to range overlap checks:

-- GIQL
a.interval INTERSECTS b.interval

-- SQL (same chromosome, overlapping ranges)
a.chromosome = b.chromosome
AND a.start_pos < b.end_pos
AND a.end_pos > b.start_pos

CONTAINS expands to containment checks:

-- GIQL
a.interval CONTAINS b.interval

-- SQL
a.chromosome = b.chromosome
AND a.start_pos <= b.start_pos
AND a.end_pos >= b.end_pos

DISTANCE expands to gap calculations:

-- GIQL
DISTANCE(a.interval, b.interval)

-- SQL (simplified)
CASE
    WHEN a.chromosome != b.chromosome THEN NULL
    WHEN a.end_pos <= b.start_pos THEN b.start_pos - a.end_pos
    WHEN b.end_pos <= a.start_pos THEN a.start_pos - b.end_pos
    ELSE 0
END

Using the Transpile Method

Basic Transpilation

Use transpile() to see generated SQL without executing:

from giql import GIQLEngine

with GIQLEngine(target_dialect="duckdb") as engine:
    engine.register_table_schema(
        "variants",
        {
            "chromosome": "VARCHAR",
            "start_pos": "BIGINT",
            "end_pos": "BIGINT",
        },
        genomic_column="interval",
    )

    sql = engine.transpile("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

    print(sql)
    # Output: SELECT * FROM variants
    #         WHERE chromosome = 'chr1' AND start_pos < 2000 AND end_pos > 1000

Transpiling Complex Queries

Transpilation works with all GIQL features:

# Join query
sql = engine.transpile("""
    SELECT v.*, g.name AS gene_name
    FROM variants v
    JOIN genes g ON v.interval INTERSECTS g.interval
    WHERE v.quality >= 30
""")
print(sql)

# NEAREST query
sql = engine.transpile("""
    SELECT peaks.name, nearest.name, nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=5) AS nearest
""")
print(sql)

# Aggregation query
sql = engine.transpile("""
    SELECT MERGE(interval), COUNT(*) AS count
    FROM features
""")
print(sql)

Debugging with Transpilation

Understanding Query Expansion

Use transpilation to understand what GIQL does:

# See how ANY quantifier expands
sql = engine.transpile("""
    SELECT * FROM variants
    WHERE interval INTERSECTS ANY('chr1:1000-2000', 'chr2:5000-6000')
""")
print(sql)
# Shows the OR conditions for each range

# See how join conditions expand
sql = engine.transpile("""
    SELECT a.*, b.name
    FROM features_a a
    JOIN features_b b ON a.interval INTERSECTS b.interval
""")
print(sql)
# Shows the full range comparison predicates

Verbose Mode

Enable verbose mode for detailed transpilation information:

with GIQLEngine(target_dialect="duckdb", verbose=True) as engine:
    engine.register_table_schema("variants", {...}, genomic_column="interval")

    # Transpilation will print detailed information
    sql = engine.transpile("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

    # Execution also shows transpilation details
    cursor = engine.execute("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

Troubleshooting Transpilation

Query not expanding correctly:

# Check that schema is registered
sql = engine.transpile("SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'")
if "interval INTERSECTS" in sql:
    print("Schema not registered for 'variants' table")

Wrong column names in output:

# Verify column mapping
engine.register_table_schema(
    "variants",
    {...},
    genomic_column="interval",
    chromosome_column="chrom",      # Check these match your table
    start_column="start",
    end_column="end",
)

Comparing Dialects

Same Query, Different SQL

See how the same query translates for different backends:

query = """
    SELECT * FROM variants
    WHERE interval INTERSECTS 'chr1:1000-2000'
"""

schema = {
    "chromosome": "VARCHAR",
    "start_pos": "BIGINT",
    "end_pos": "BIGINT",
}

# DuckDB
with GIQLEngine(target_dialect="duckdb") as engine:
    engine.register_table_schema("variants", schema, genomic_column="interval")
    print("DuckDB SQL:")
    print(engine.transpile(query))
    print()

# SQLite
with GIQLEngine(target_dialect="sqlite") as engine:
    engine.register_table_schema("variants", schema, genomic_column="interval")
    print("SQLite SQL:")
    print(engine.transpile(query))

Dialect-Specific Differences

Some queries may generate different SQL for different dialects:

  • String functions may use different names

  • Type casting syntax may vary

  • Window function support may differ

GIQL handles these differences automatically, but understanding them helps when debugging or integrating with external tools.

Using Transpiled SQL Externally

With External Database Connections

Use transpiled SQL with your own database connections:

import duckdb

# Generate SQL using GIQL
with GIQLEngine(target_dialect="duckdb") as giql_engine:
    giql_engine.register_table_schema("variants", {...}, genomic_column="interval")
    sql = giql_engine.transpile("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

# Execute with external connection
conn = duckdb.connect("my_database.duckdb")
result = conn.execute(sql).fetchall()
conn.close()

With ORMs and Query Builders

Integrate transpiled SQL with SQLAlchemy or other ORMs:

from sqlalchemy import create_engine, text

# Generate SQL
with GIQLEngine(target_dialect="duckdb") as giql_engine:
    giql_engine.register_table_schema("variants", {...}, genomic_column="interval")
    sql = giql_engine.transpile("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

# Execute with SQLAlchemy
sa_engine = create_engine("duckdb:///my_database.duckdb")
with sa_engine.connect() as conn:
    result = conn.execute(text(sql))
    for row in result:
        print(row)

Building SQL Pipelines

Use transpilation in data pipelines:

def build_intersection_query(table_a, table_b, region):
    """Generate SQL for intersection query."""
    with GIQLEngine(target_dialect="duckdb") as engine:
        engine.register_table_schema(table_a, {...}, genomic_column="interval")
        engine.register_table_schema(table_b, {...}, genomic_column="interval")

        return engine.transpile(f"""
            SELECT a.*, b.name
            FROM {table_a} a
            JOIN {table_b} b ON a.interval INTERSECTS b.interval
            WHERE a.interval INTERSECTS '{region}'
        """)

# Use in pipeline
sql = build_intersection_query("variants", "genes", "chr1:1000000-2000000")
# Execute sql with your preferred method

Saving Queries

Save transpiled SQL for documentation or reuse:

# Generate and save SQL
with GIQLEngine(target_dialect="duckdb") as engine:
    engine.register_table_schema("variants", {...}, genomic_column="interval")

    sql = engine.transpile("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

    with open("query.sql", "w") as f:
        f.write(sql)

# Later, execute saved SQL
with open("query.sql") as f:
    sql = f.read()

conn = duckdb.connect("database.duckdb")
result = conn.execute(sql).fetchall()

Advanced Transpilation

Parameterized Queries

Build queries with parameters:

def query_region(engine, chrom, start, end):
    """Query a parameterized region."""
    region = f"{chrom}:{start}-{end}"
    return engine.execute(f"""
        SELECT * FROM variants
        WHERE interval INTERSECTS '{region}'
    """)

# Use with different regions
cursor = query_region(engine, "chr1", 1000000, 2000000)
cursor = query_region(engine, "chr2", 5000000, 6000000)

Dynamic Query Building

Build queries programmatically:

def build_multi_table_query(tables, target_region):
    """Build a query that unions results from multiple tables."""
    union_parts = []
    for table in tables:
        union_parts.append(f"""
            SELECT *, '{table}' AS source FROM {table}
            WHERE interval INTERSECTS '{target_region}'
        """)

    query = " UNION ALL ".join(union_parts)
    return engine.transpile(query)

Inspecting the AST

For advanced debugging, you can inspect the parsed query:

# GIQL uses sqlglot internally
# The transpiled SQL shows the final result
sql = engine.transpile("SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'")

# For deep debugging, examine the generated SQL structure
print(sql)