GIQL - Genomic Interval Query Language
GIQL is a SQL dialect for genomic range queries with multi-database support.
Genomic analysis often requires repetitive, complex SQL patterns to express simple operations like finding overlapping intervals or merging features. GIQL extends SQL with dedicated operators for these common tasks, so you can declaratively express what you want to compute without getting lost in SQL boilerplate. GIQL queries read naturally, even without SQL expertise - this clarity makes your analysis code easier to review and share. Best of all, GIQL queries work across DuckDB, SQLite, PostgreSQL, and other databases, so you’re never locked into a specific engine and can choose the tool that fits your use case. Finally, GIQL operators follow established conventions from tools like bedtools, so the semantics are familiar and predictable.
Getting Started
Operator Reference
Guides
Reference
Quick Start
Install GIQL:
pip install giql
Basic usage:
from giql import GIQLEngine
# Create engine with DuckDB backend
with GIQLEngine(target_dialect="duckdb") as engine:
# Load genomic data
engine.load_csv("variants", "variants.csv")
engine.register_table_schema(
"variants",
{
"id": "INTEGER",
"chromosome": "VARCHAR",
"start_pos": "BIGINT",
"end_pos": "BIGINT",
},
genomic_column="interval",
)
# Query with genomic operators (returns cursor for streaming)
cursor = engine.execute("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
""")
# Process results
for row in cursor:
print(row)
# Or just transpile to SQL without executing
sql = engine.transpile("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
""")
print(sql) # See the generated SQL
Features
SQL-based: Familiar SQL syntax with genomic extensions
Multi-backend: Works with DuckDB, SQLite, and more
Spatial operators: INTERSECTS, CONTAINS, WITHIN, DISTANCE, NEAREST
Aggregation operators: CLUSTER, MERGE for combining intervals
Set quantifiers: ANY, ALL for multi-range queries
Column-to-column joins: Join tables on genomic position
Transpilation: Convert GIQL to standard SQL for debugging or external use
Operators at a Glance
Spatial Relationships:
-- Find overlapping features
WHERE interval INTERSECTS 'chr1:1000-2000'
-- Find containing/contained features
WHERE gene.interval CONTAINS variant.interval
Distance and Proximity:
-- Calculate distance between intervals
SELECT DISTANCE(a.interval, b.interval) AS dist
-- Find k-nearest neighbors
FROM peaks CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=5)
Aggregation:
-- Cluster overlapping intervals
SELECT *, CLUSTER(interval) AS cluster_id FROM features
-- Merge overlapping intervals
SELECT MERGE(interval) FROM features
Set Quantifiers:
-- Match any of multiple regions
WHERE interval INTERSECTS ANY('chr1:1000-2000', 'chr2:5000-6000')
See GIQL Operators for complete operator documentation.