API Reference#

transpile(-> str)

Transpile a GIQL query to SQL.

Table(name[, genomic_col, chrom_col, ...])

Genomic table configuration for transpilation.

giql.transpile(giql: str, tables: list[str | Table] | None = None, *, dialect: None = None, intersects_bin_size: int | None = None) str[source]#
giql.transpile(giql: str, tables: list[str | Table] | None = None, *, dialect: Literal['duckdb'], intersects_bin_size: None = None) str

Transpile a GIQL query to SQL.

Parses the GIQL syntax and converts it to standard SQL-92 compatible output (uses LATERAL joins where needed for operations like NEAREST).

Parameters#

giqlstr

The GIQL query string containing genomic extensions like INTERSECTS, CONTAINS, WITHIN, CLUSTER, MERGE, NEAREST, or DISJOIN.

tableslist[str | Table] | None

Table configurations. Strings use default column mappings (chrom, start, end, strand). Table objects provide custom column name mappings.

dialectLiteral[“duckdb”] | None

Optional target dialect. When set to "duckdb", column-to-column INTERSECTS joins (INNER, SEMI, or ANTI) are transpiled into a per-chromosome dynamic-SQL pattern (SET VARIABLE + query(getvariable(...))) that DuckDB plans through its range-join family (IE_JOIN / PIECEWISE_MERGE_JOIN). Mutually exclusive with intersects_bin_size. Defaults to None (the generic binned equi-join path). Hard-error projection shapes raise ValueError at transpile time; see the performance guide for the full enumeration.

intersects_bin_sizeint | None

Bin size for INTERSECTS equi-join optimization. When a query contains a full-table column-to-column INTERSECTS join, the transpiler rewrites it as a binned equi-join for performance. Defaults to 10,000 if not specified. Cannot be combined with dialect="duckdb".

Returns#

str

The transpiled SQL query.

Raises#

ValueError

If the query cannot be parsed or transpiled, if dialect is unknown, or if dialect="duckdb" and intersects_bin_size are both set.

Examples#

Basic usage with default column mappings:

sql = transpile(
    "SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'",
    tables=["peaks"],
)

Custom Table configuration:

sql = transpile(
    "SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'",
    tables=[
        Table(
            "peaks",
            genomic_col="interval",
            chrom_col="chrom",
            start_col="start",
            end_col="end",
        )
    ],
)

Binned equi-join with custom bin size:

sql = transpile(
    "SELECT a.*, b.* FROM peaks a JOIN genes b "
    "ON a.interval INTERSECTS b.interval",
    tables=["peaks", "genes"],
    intersects_bin_size=100000,
)

DuckDB IEJoin dialect (column-to-column INNER/SEMI/ANTI JOIN only; projections must be qualified):

sql = transpile(
    "SELECT a.chrom, a.start, b.start "
    "FROM peaks a JOIN genes b ON a.interval INTERSECTS b.interval",
    tables=["peaks", "genes"],
    dialect="duckdb",
)
class giql.Table(name, genomic_col='interval', chrom_col='chrom', start_col='start', end_col='end', strand_col='strand', coordinate_system='0based', interval_type='half_open')[source]#

Genomic table configuration for transpilation.

This class defines how genomic intervals are stored in a database table, mapping a pseudo-column name (genomic_col) to the physical columns that store chromosome, start, end, and optionally strand information.

Parameters#

namestr

The table name.

genomic_colstr

The pseudo-column name used in GIQL queries to reference the genomic interval (default: “interval”).

chrom_colstr

The physical column name storing chromosome/contig (default: “chrom”).

start_colstr

The physical column name storing interval start position (default: “start”).

end_colstr

The physical column name storing interval end position (default: “end”).

strand_colstr | None

The physical column name storing strand information, or None if the table has no strand column (default: “strand”).

coordinate_systemLiteral[“0based”, “1based”]

The coordinate system used for positions (default: “0based”).

interval_typeLiteral[“half_open”, “closed”]

The interval endpoint convention (default: “half_open”).

Examples#

Using default column names (via transpile):

sql = transpile(query, tables=["peaks"])

Mixing default and custom table configurations:

sql = transpile(
    query,
    tables=[
        "peaks",
        Table(
            "variants",
            genomic_col="position",
            chrom_col="chr",
            start_col="pos_start",
            end_col="pos_end",
            strand_col=None,  # No strand column
            coordinate_system="1based",
            interval_type="closed",
        ),
    ],
)
name: str#
genomic_col: str = 'interval'#
chrom_col: str = 'chrom'#
start_col: str = 'start'#
end_col: str = 'end'#
strand_col: str | None = 'strand'#
coordinate_system: Literal['0based', '1based'] = '0based'#
interval_type: Literal['half_open', 'closed'] = 'half_open'#
__post_init__()[source]#

Validate field values after initialization.

__init__(name, genomic_col='interval', chrom_col='chrom', start_col='start', end_col='end', strand_col='strand', coordinate_system='0based', interval_type='half_open')#