Quick Start
Installation
Install GIQL using pip:
pip install giql
Or with optional dependencies:
pip install giql[duckdb] # For DuckDB support
Basic Usage
Expected Schema
GIQL works with genomic data stored in tables with separate columns for chromosome, start position, and end position. The typical schema includes:
chromosome: Chromosome identifier (e.g., ‘chr1’, ‘chr2’, ‘chrX’)
start_pos: Start position of the genomic interval (0-based, inclusive)
end_pos: End position of the genomic interval (0-based, exclusive, half-open)
strand (optional): Strand orientation (‘+’, ‘-’, or ‘.’)
You must register the table schema with GIQL, mapping the logical genomic column (used in queries) to the physical columns in your table:
engine.register_table_schema(
"table_name",
{
"chromosome": "VARCHAR",
"start_pos": "BIGINT",
"end_pos": "BIGINT",
"strand": "VARCHAR", # Optional
# ... other columns ...
},
genomic_column="interval", # Logical name used in queries
)
After registration, you can use interval in your GIQL queries, and the engine
will automatically map it to the chromosome, start_pos, and end_pos
columns.
Query with DuckDB
from giql import GIQLEngine
with GIQLEngine(target_dialect="duckdb") as engine:
# Load CSV file into database
engine.load_csv("variants", "variants.csv")
# Register schema mapping
engine.register_table_schema(
"variants",
{
"id": "INTEGER",
"chromosome": "VARCHAR",
"start_pos": "BIGINT",
"end_pos": "BIGINT",
},
genomic_column="interval",
)
# Query using the logical 'interval' column (returns cursor for streaming)
cursor = engine.execute("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
""")
# Process results lazily
for row in cursor:
print(row)
# Or materialize to pandas DataFrame
import pandas as pd
cursor = engine.execute("SELECT ...")
df = pd.DataFrame(cursor.fetchall(), columns=[desc[0] for desc in cursor.description])
Query with SQLite
from giql import GIQLEngine
with GIQLEngine(target_dialect="sqlite", db_path="data.db") as engine:
cursor = engine.execute("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
""")
# Iterate results
for row in cursor:
print(row)
Spatial Operators
INTERSECTS
Check if genomic ranges overlap:
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
CONTAINS
Check if a range contains a point or another range:
SELECT * FROM variants
WHERE interval CONTAINS 'chr1:1500'
WITHIN
Check if a range is within another range:
SELECT * FROM variants
WHERE interval WITHIN 'chr1:1000-5000'
Set Quantifiers
ANY
Match any of the specified ranges:
SELECT * FROM variants
WHERE interval INTERSECTS ANY('chr1:1000-2000', 'chr1:5000-6000')
ALL
Match all of the specified ranges:
SELECT * FROM variants
WHERE interval CONTAINS ALL('chr1:1500', 'chr1:1600')
Column-to-Column Joins
Join tables on genomic position:
SELECT v.*, g.name
FROM variants v
INNER JOIN genes g ON v.interval INTERSECTS g.interval
Transpiling to SQL
The transpile() method converts GIQL queries to standard SQL without executing them.
This is useful for debugging, understanding the generated SQL, or integrating with external tools:
from giql import GIQLEngine
with GIQLEngine(target_dialect="duckdb") as engine:
# Register table schema
engine.register_table_schema(
"variants",
{
"chromosome": "VARCHAR",
"start_pos": "BIGINT",
"end_pos": "BIGINT",
},
genomic_column="interval",
)
# Transpile GIQL to SQL
sql = engine.transpile("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000-2000'
""")
print(sql)
# Output: SELECT * FROM variants WHERE chromosome = 'chr1' AND start_pos < 2000 AND end_pos > 1000
Different target dialects generate different SQL:
# DuckDB dialect
with GIQLEngine(target_dialect="duckdb") as engine:
sql = engine.transpile("SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'")
# Generates DuckDB-optimized SQL
# SQLite dialect
with GIQLEngine(target_dialect="sqlite") as engine:
sql = engine.transpile("SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'")
# Generates SQLite-compatible SQL
The transpiled SQL can be executed directly on your database or used with other tools.
Use verbose=True when creating the engine to see detailed transpilation information:
with GIQLEngine(target_dialect="duckdb", verbose=True) as engine:
sql = engine.transpile("SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'")
# Prints detailed information about the transpilation process