Syntax Reference

Quick reference for GIQL syntax and operators.

Genomic Range Literals

Format

Genomic ranges are specified as string literals:

'chromosome:start-end'

Examples

'chr1:1000-2000'        -- Range on chr1 from 1000 to 2000
'chr1:1000'             -- Point at position 1000
'chrX:50000-100000'     -- Range on chrX
'chr1:0-1000000'        -- First megabase of chr1

Coordinate System

  • 0-based start: First base is position 0

  • Half-open interval: [start, end) - start inclusive, end exclusive

  • Range chr1:100-200 covers bases 100 through 199

Spatial Operators

INTERSECTS

Test if ranges overlap.

-- Against literal
interval INTERSECTS 'chr1:1000-2000'

-- Column to column
a.interval INTERSECTS b.interval

-- In JOIN
JOIN table ON a.interval INTERSECTS b.interval

CONTAINS

Test if one range fully contains another.

-- Range contains point
interval CONTAINS 'chr1:1500'

-- Range contains range
interval CONTAINS 'chr1:1200-1800'

-- Column to column
gene.interval CONTAINS exon.interval

WITHIN

Test if one range is fully within another.

-- Range within literal
interval WITHIN 'chr1:1000-5000'

-- Column to column
exon.interval WITHIN gene.interval

Distance Operators

DISTANCE

Calculate distance between two positions.

DISTANCE(a.interval, b.interval)

Returns:

  • 0 for overlapping ranges

  • Positive integer (gap in bp) for non-overlapping

  • NULL for different chromosomes

NEAREST

Find k-nearest neighbors.

-- Basic syntax
CROSS JOIN LATERAL NEAREST(
    target_table,
    reference=source.interval,
    k=N
) AS alias

-- With parameters
NEAREST(
    target_table,
    reference=interval,
    k=5,
    max_distance=100000,
    stranded=true,
    signed=true
)

-- Standalone
SELECT * FROM NEAREST(table, reference='chr1:1000-2000', k=5)

Parameters:

  • k: Number of neighbors (default: 1)

  • max_distance: Maximum distance threshold

  • stranded: Same-strand only (default: false)

  • signed: Signed distances (default: false)

Aggregation Operators

CLUSTER

Assign cluster IDs to overlapping intervals.

-- Basic
CLUSTER(interval) AS cluster_id

-- With distance
CLUSTER(interval, 1000) AS cluster_id

-- Strand-specific
CLUSTER(interval, stranded=true) AS cluster_id

-- Combined
CLUSTER(interval, 1000, stranded=true) AS cluster_id

MERGE

Combine overlapping intervals.

-- Basic
SELECT MERGE(interval) FROM table

-- With distance
SELECT MERGE(interval, 1000) FROM table

-- Strand-specific
SELECT MERGE(interval, stranded=true) FROM table

-- With aggregations
SELECT MERGE(interval), COUNT(*), AVG(score) FROM table

Set Quantifiers

ANY

Match any of multiple ranges.

interval INTERSECTS ANY('chr1:1000-2000', 'chr2:5000-6000')
interval CONTAINS ANY('chr1:1500', 'chr1:2500')
interval WITHIN ANY('chr1:0-10000', 'chr2:0-10000')

ALL

Match all of multiple ranges.

interval CONTAINS ALL('chr1:1500', 'chr1:1600', 'chr1:1700')
interval INTERSECTS ALL('chr1:1000-1100', 'chr1:1050-1150')

Query Patterns

Basic Filter

SELECT * FROM table
WHERE interval INTERSECTS 'chr1:1000-2000'

Join

SELECT a.*, b.name
FROM table_a a
JOIN table_b b ON a.interval INTERSECTS b.interval

Left Outer Join

SELECT a.*, b.name
FROM table_a a
LEFT JOIN table_b b ON a.interval INTERSECTS b.interval

Exclusion (NOT IN)

SELECT a.*
FROM table_a a
LEFT JOIN table_b b ON a.interval INTERSECTS b.interval
WHERE b.chromosome IS NULL

Count Overlaps

SELECT a.*, COUNT(b.name) AS overlap_count
FROM table_a a
LEFT JOIN table_b b ON a.interval INTERSECTS b.interval
GROUP BY a.chromosome, a.start_pos, a.end_pos, ...

K-Nearest Neighbors

SELECT source.*, nearest.name, nearest.distance
FROM source
CROSS JOIN LATERAL NEAREST(target, reference=source.interval, k=5) AS nearest

Clustering

SELECT *, CLUSTER(interval) AS cluster_id
FROM table
ORDER BY chromosome, start_pos

Merging

SELECT MERGE(interval), COUNT(*) AS count
FROM table

Engine Methods

execute()

Execute a GIQL query and return a cursor.

cursor = engine.execute("SELECT * FROM table WHERE interval INTERSECTS 'chr1:1000-2000'")

transpile()

Convert GIQL to SQL without executing.

sql = engine.transpile("SELECT * FROM table WHERE interval INTERSECTS 'chr1:1000-2000'")

register_table_schema()

Register a table’s schema for genomic operations.

engine.register_table_schema(
    "table_name",
    {
        "chromosome": "VARCHAR",
        "start_pos": "BIGINT",
        "end_pos": "BIGINT",
        "name": "VARCHAR",
    },
    genomic_column="interval",
    chromosome_column="chromosome",  # optional, default: "chromosome"
    start_column="start_pos",        # optional, default: "start_pos"
    end_column="end_pos",            # optional, default: "end_pos"
)

load_csv()

Load a CSV file into a table.

engine.load_csv("table_name", "file.csv")
engine.load_csv("table_name", "file.tsv", delimiter="\t")