API Reference

This section documents the GIQL Python API.

GIQL - Genomic Interval Query Language.

A SQL dialect for genomic range queries with multi-database support.

This package provides:
  • GIQL dialect extending SQL with spatial operators

  • Query engine supporting multiple backends (DuckDB, SQLite)

  • Range parser for genomic coordinate strings

  • Schema management for genomic data

class giql.GIQLEngine(target_dialect='duckdb', connection=None, db_path=':memory:', verbose=False, **dialect_options)[source]

Bases: object

Multi-backend GIQL query engine.

Supports multiple SQL databases through transpilation of GIQL syntax to standard SQL. Can work with DuckDB, SQLite, and other backends.

Examples

Query a pandas DataFrame with DuckDB:

import pandas as pd
from giql import GIQLEngine

df = pd.DataFrame(
    {
        "id": [1, 2, 3],
        "chromosome": ["chr1", "chr1", "chr2"],
        "start_pos": [1500, 10500, 500],
        "end_pos": [1600, 10600, 600],
    }
)
with GIQLEngine(target_dialect="duckdb") as engine:
    engine.conn.register("variants", df)
    cursor = engine.execute(
        "SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'"
    )
    for row in cursor:
        print(row)

Load from CSV:

with GIQLEngine(target_dialect="duckdb") as engine:
    engine.load_csv("variants", "variants.csv")
    cursor = engine.execute(
        "SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'"
    )
    # Process rows lazily
    while True:
        row = cursor.fetchone()
        if row is None:
            break
        print(row)

Using SQLite backend:

with GIQLEngine(target_dialect="sqlite", db_path="data.db") as engine:
    cursor = engine.execute(
        "SELECT * FROM variants WHERE interval INTERSECTS 'chr1:1000-2000'"
    )
    # Materialize all results at once
    results = cursor.fetchall()
__init__(target_dialect='duckdb', connection=None, db_path=':memory:', verbose=False, **dialect_options)[source]

Initialize engine.

Parameters:
  • target_dialect (Literal['duckdb', 'sqlite'] | str) – Target SQL dialect (‘duckdb’, ‘sqlite’, ‘standard’)

  • connection – Existing database connection (optional)

  • db_path (str) – Database path or connection string

  • verbose (bool) – Print transpiled SQL

  • dialect_options – Additional options for specific dialects

close()[source]

Close database connection.

Only closes connections created by the engine. If an external connection was provided during initialization, it is not closed.

execute(giql)[source]

Execute a GIQL query and return a database cursor.

Parses the GIQL syntax, transpiles to target SQL dialect, and executes the query returning a cursor for lazy iteration.

Parameters:

giql (str) – Query string with GIQL genomic extensions

Returns:

Database cursor (DB-API 2.0 compatible) that can be iterated

Raises:

ValueError – If the query cannot be parsed, transpiled, or executed

Return type:

CursorLike

execute_raw(sql)[source]

Execute raw SQL directly, bypassing GIQL parsing.

Parameters:

sql (str) – Raw SQL query string

Returns:

Query results as a pandas DataFrame

Return type:

DataFrame

load_csv(table_name, file_path)[source]

Load CSV file into database.

Parameters:
  • table_name (str) – Name to assign to the table

  • file_path (str) – Path to the CSV file

load_parquet(table_name, file_path)[source]

Load Parquet file into database.

Parameters:
  • table_name (str) – Name to assign to the table

  • file_path (str) – Path to the Parquet file

register_table_schema(table_name, columns, genomic_column='interval', chrom_col='chromosome', start_col='start_pos', end_col='end_pos', strand_col='strand', coordinate_system='0based', interval_type='half_open')[source]

Register schema for a table.

This method tells the engine how genomic ranges are stored in the table, mapping logical genomic column names to physical column names.

Parameters:
  • table_name (str) – Table name

  • columns (dict[str, str]) – Dict of column_name -> type

  • genomic_column (str) – Logical name for genomic position

  • chrom_col (str) – Physical chromosome column

  • start_col (str) – Physical start position column

  • end_col (str) – Physical end position column

  • strand_col (str | None) – Physical strand column (optional)

  • coordinate_system (str) – Coordinate system: “0based” or “1based” (default: “0based”)

  • interval_type (str) – Interval endpoint handling: “half_open” or “closed” (default: “half_open”)

transpile(giql)[source]

Transpile a GIQL query to the engine’s target SQL dialect.

Parses the GIQL syntax and transpiles it to the target SQL dialect without executing it. Useful for debugging or generating SQL for external use.

Parameters:

giql (str) – Query string with GIQL genomic extensions

Returns:

Transpiled SQL query string in the target dialect

Raises:

ValueError – If the query cannot be parsed or transpiled

Return type:

str