API Reference#
|
Transpile a GIQL query to SQL. |
|
Genomic table configuration for transpilation. |
- giql.transpile(giql: str, tables: list[str | Table] | None = None, *, dialect: None = None, intersects_bin_size: int | None = None) str[source]#
- giql.transpile(giql: str, tables: list[str | Table] | None = None, *, dialect: Literal['duckdb'], intersects_bin_size: None = None) str
Transpile a GIQL query to SQL.
Parses the GIQL syntax and converts it to standard SQL-92 compatible output (uses LATERAL joins where needed for operations like NEAREST).
Parameters#
- giqlstr
The GIQL query string containing genomic extensions like INTERSECTS, CONTAINS, WITHIN, CLUSTER, MERGE, NEAREST, or DISJOIN.
- tableslist[str |
Table] | None Table configurations. Strings use default column mappings (chrom, start, end, strand).
Tableobjects provide custom column name mappings.- dialectLiteral[“duckdb”] | None
Optional target dialect. When set to
"duckdb", column-to-columnINTERSECTSjoins (INNER, SEMI, or ANTI) are transpiled into a per-chromosome dynamic-SQL pattern (SET VARIABLE+query(getvariable(...))) that DuckDB plans through its range-join family (IE_JOIN/PIECEWISE_MERGE_JOIN). Mutually exclusive withintersects_bin_size. Defaults toNone(the generic binned equi-join path). Hard-error projection shapes raiseValueErrorat transpile time; see the performance guide for the full enumeration.- intersects_bin_sizeint | None
Bin size for INTERSECTS equi-join optimization. When a query contains a full-table column-to-column INTERSECTS join, the transpiler rewrites it as a binned equi-join for performance. Defaults to 10,000 if not specified. Cannot be combined with
dialect="duckdb".
Returns#
- str
The transpiled SQL query.
Raises#
- ValueError
If the query cannot be parsed or transpiled, if
dialectis unknown, or ifdialect="duckdb"andintersects_bin_sizeare both set.
Examples#
Basic usage with default column mappings:
sql = transpile( "SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'", tables=["peaks"], )
Custom
Tableconfiguration:sql = transpile( "SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'", tables=[ Table( "peaks", genomic_col="interval", chrom_col="chrom", start_col="start", end_col="end", ) ], )
Binned equi-join with custom bin size:
sql = transpile( "SELECT a.*, b.* FROM peaks a JOIN genes b " "ON a.interval INTERSECTS b.interval", tables=["peaks", "genes"], intersects_bin_size=100000, )
DuckDB IEJoin dialect (column-to-column INNER/SEMI/ANTI JOIN only; projections must be qualified):
sql = transpile( "SELECT a.chrom, a.start, b.start " "FROM peaks a JOIN genes b ON a.interval INTERSECTS b.interval", tables=["peaks", "genes"], dialect="duckdb", )
- class giql.Table(name, genomic_col='interval', chrom_col='chrom', start_col='start', end_col='end', strand_col='strand', coordinate_system='0based', interval_type='half_open')[source]#
Genomic table configuration for transpilation.
This class defines how genomic intervals are stored in a database table, mapping a pseudo-column name (genomic_col) to the physical columns that store chromosome, start, end, and optionally strand information.
Parameters#
- namestr
The table name.
- genomic_colstr
The pseudo-column name used in GIQL queries to reference the genomic interval (default: “interval”).
- chrom_colstr
The physical column name storing chromosome/contig (default: “chrom”).
- start_colstr
The physical column name storing interval start position (default: “start”).
- end_colstr
The physical column name storing interval end position (default: “end”).
- strand_colstr | None
The physical column name storing strand information, or None if the table has no strand column (default: “strand”).
- coordinate_systemLiteral[“0based”, “1based”]
The coordinate system used for positions (default: “0based”).
- interval_typeLiteral[“half_open”, “closed”]
The interval endpoint convention (default: “half_open”).
Examples#
Using default column names (via transpile):
sql = transpile(query, tables=["peaks"])
Mixing default and custom table configurations:
sql = transpile( query, tables=[ "peaks", Table( "variants", genomic_col="position", chrom_col="chr", start_col="pos_start", end_col="pos_end", strand_col=None, # No strand column coordinate_system="1based", interval_type="closed", ), ], )
- __init__(name, genomic_col='interval', chrom_col='chrom', start_col='start', end_col='end', strand_col='strand', coordinate_system='0based', interval_type='half_open')#