GIQL - Genomic Interval Query Language

GIQL is a SQL dialect for genomic range queries with multi-database support.

Genomic analysis often requires repetitive, complex SQL patterns to express simple operations like finding overlapping intervals or merging features. GIQL extends SQL with dedicated operators for these common tasks, so you can declaratively express what you want to compute without getting lost in SQL boilerplate. GIQL queries read naturally, even without SQL expertise - this clarity makes your analysis code easier to review and share. Best of all, GIQL queries work across DuckDB, SQLite, PostgreSQL, and other databases, so you’re never locked into a specific engine and can choose the tool that fits your use case. Finally, GIQL operators follow established conventions from tools like bedtools, so the semantics are familiar and predictable.

Quick Start

Install GIQL:

pip install giql

Basic usage:

from giql import GIQLEngine

# Create engine with DuckDB backend
with GIQLEngine(target_dialect="duckdb") as engine:
    # Load genomic data
    engine.load_csv("variants", "variants.csv")
    engine.register_table_schema(
        "variants",
        {
            "id": "INTEGER",
            "chromosome": "VARCHAR",
            "start_pos": "BIGINT",
            "end_pos": "BIGINT",
        },
        genomic_column="interval",
    )

    # Query with genomic operators (returns cursor for streaming)
    cursor = engine.execute("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)

    # Process results
    for row in cursor:
        print(row)

    # Or just transpile to SQL without executing
    sql = engine.transpile("""
        SELECT * FROM variants
        WHERE interval INTERSECTS 'chr1:1000-2000'
    """)
    print(sql)  # See the generated SQL

Features

  • SQL-based: Familiar SQL syntax with genomic extensions

  • Multi-backend: Works with DuckDB, SQLite, and more

  • Spatial operators: INTERSECTS, CONTAINS, WITHIN, DISTANCE, NEAREST

  • Aggregation operators: CLUSTER, MERGE for combining intervals

  • Set quantifiers: ANY, ALL for multi-range queries

  • Column-to-column joins: Join tables on genomic position

  • Transpilation: Convert GIQL to standard SQL for debugging or external use

Operators at a Glance

Spatial Relationships:

-- Find overlapping features
WHERE interval INTERSECTS 'chr1:1000-2000'

-- Find containing/contained features
WHERE gene.interval CONTAINS variant.interval

Distance and Proximity:

-- Calculate distance between intervals
SELECT DISTANCE(a.interval, b.interval) AS dist

-- Find k-nearest neighbors
FROM peaks CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=5)

Aggregation:

-- Cluster overlapping intervals
SELECT *, CLUSTER(interval) AS cluster_id FROM features

-- Merge overlapping intervals
SELECT MERGE(interval) FROM features

Set Quantifiers:

-- Match any of multiple regions
WHERE interval INTERSECTS ANY('chr1:1000-2000', 'chr2:5000-6000')

See GIQL Operators for complete operator documentation.

Indices and tables