API Reference#
|
Transpile a GIQL query to SQL. |
|
Genomic table configuration for transpilation. |
- giql.transpile(giql, tables=None, *, intersects_bin_size=None)[source]#
Transpile a GIQL query to SQL.
Parses the GIQL syntax and converts it to standard SQL-92 compatible output (uses LATERAL joins where needed for operations like NEAREST).
Parameters#
- giqlstr
The GIQL query string containing genomic extensions like INTERSECTS, CONTAINS, WITHIN, CLUSTER, MERGE, or NEAREST.
- tableslist[str | Table] | None
Table configurations. Strings use default column mappings (chrom, start, end, strand). Table objects provide custom column name mappings.
- intersects_bin_sizeint | None
Bin size for INTERSECTS equi-join optimization. When a query contains a full-table column-to-column INTERSECTS join, the transpiler rewrites it as a binned equi-join for performance. Defaults to 10,000 if not specified.
Returns#
- str
The transpiled SQL query.
Raises#
- ValueError
If the query cannot be parsed or transpiled.
Examples#
Basic usage with default column mappings:
sql = transpile( "SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'", tables=["peaks"], )
Custom table configuration:
sql = transpile( "SELECT * FROM peaks WHERE interval INTERSECTS 'chr1:1000-2000'", tables=[ Table( "peaks", genomic_col="interval", chrom_col="chrom", start_col="start", end_col="end", ) ], )
Binned equi-join with custom bin size:
sql = transpile( "SELECT a.*, b.* FROM peaks a JOIN genes b " "ON a.interval INTERSECTS b.interval", tables=["peaks", "genes"], intersects_bin_size=100000, )
- class giql.Table(name, genomic_col='interval', chrom_col='chrom', start_col='start', end_col='end', strand_col='strand', coordinate_system='0based', interval_type='half_open')[source]#
Genomic table configuration for transpilation.
This class defines how genomic intervals are stored in a database table, mapping a pseudo-column name (genomic_col) to the physical columns that store chromosome, start, end, and optionally strand information.
Parameters#
- namestr
The table name.
- genomic_colstr
The pseudo-column name used in GIQL queries to reference the genomic interval (default: “interval”).
- chrom_colstr
The physical column name storing chromosome/contig (default: “chrom”).
- start_colstr
The physical column name storing interval start position (default: “start”).
- end_colstr
The physical column name storing interval end position (default: “end”).
- strand_colstr | None
The physical column name storing strand information, or None if the table has no strand column (default: “strand”).
- coordinate_systemLiteral[“0based”, “1based”]
The coordinate system used for positions (default: “0based”).
- interval_typeLiteral[“half_open”, “closed”]
The interval endpoint convention (default: “half_open”).
Examples#
Using default column names (via transpile):
sql = transpile(query, tables=["peaks"])
Mixing default and custom table configurations:
sql = transpile( query, tables=[ "peaks", Table( "variants", genomic_col="position", chrom_col="chr", start_col="pos_start", end_col="pos_end", strand_col=None, # No strand column coordinate_system="1based", interval_type="closed", ), ], )
- __init__(name, genomic_col='interval', chrom_col='chrom', start_col='start', end_col='end', strand_col='strand', coordinate_system='0based', interval_type='half_open')#