Syntax Reference
Quick reference for GIQL syntax and operators.
Genomic Range Literals
Format
Genomic ranges are specified as string literals:
'chromosome:start-end'
Examples
'chr1:1000-2000' -- Range on chr1 from 1000 to 2000
'chr1:1000' -- Point at position 1000
'chrX:50000-100000' -- Range on chrX
'chr1:0-1000000' -- First megabase of chr1
Coordinate System
0-based start: First base is position 0
Half-open interval: [start, end) - start inclusive, end exclusive
Range
chr1:100-200covers bases 100 through 199
Spatial Operators
INTERSECTS
Test if ranges overlap.
-- Against literal
interval INTERSECTS 'chr1:1000-2000'
-- Column to column
a.interval INTERSECTS b.interval
-- In JOIN
JOIN table ON a.interval INTERSECTS b.interval
CONTAINS
Test if one range fully contains another.
-- Range contains point
interval CONTAINS 'chr1:1500'
-- Range contains range
interval CONTAINS 'chr1:1200-1800'
-- Column to column
gene.interval CONTAINS exon.interval
WITHIN
Test if one range is fully within another.
-- Range within literal
interval WITHIN 'chr1:1000-5000'
-- Column to column
exon.interval WITHIN gene.interval
Distance Operators
DISTANCE
Calculate distance between two positions.
DISTANCE(a.interval, b.interval)
Returns:
0for overlapping rangesPositive integer (gap in bp) for non-overlapping
NULLfor different chromosomes
NEAREST
Find k-nearest neighbors.
-- Basic syntax
CROSS JOIN LATERAL NEAREST(
target_table,
reference=source.interval,
k=N
) AS alias
-- With parameters
NEAREST(
target_table,
reference=interval,
k=5,
max_distance=100000,
stranded=true,
signed=true
)
-- Standalone
SELECT * FROM NEAREST(table, reference='chr1:1000-2000', k=5)
Parameters:
k: Number of neighbors (default: 1)max_distance: Maximum distance thresholdstranded: Same-strand only (default: false)signed: Signed distances (default: false)
Aggregation Operators
CLUSTER
Assign cluster IDs to overlapping intervals.
-- Basic
CLUSTER(interval) AS cluster_id
-- With distance
CLUSTER(interval, 1000) AS cluster_id
-- Strand-specific
CLUSTER(interval, stranded=true) AS cluster_id
-- Combined
CLUSTER(interval, 1000, stranded=true) AS cluster_id
MERGE
Combine overlapping intervals.
-- Basic
SELECT MERGE(interval) FROM table
-- With distance
SELECT MERGE(interval, 1000) FROM table
-- Strand-specific
SELECT MERGE(interval, stranded=true) FROM table
-- With aggregations
SELECT MERGE(interval), COUNT(*), AVG(score) FROM table
Set Quantifiers
ANY
Match any of multiple ranges.
interval INTERSECTS ANY('chr1:1000-2000', 'chr2:5000-6000')
interval CONTAINS ANY('chr1:1500', 'chr1:2500')
interval WITHIN ANY('chr1:0-10000', 'chr2:0-10000')
ALL
Match all of multiple ranges.
interval CONTAINS ALL('chr1:1500', 'chr1:1600', 'chr1:1700')
interval INTERSECTS ALL('chr1:1000-1100', 'chr1:1050-1150')
Query Patterns
Basic Filter
SELECT * FROM table
WHERE interval INTERSECTS 'chr1:1000-2000'
Join
SELECT a.*, b.name
FROM table_a a
JOIN table_b b ON a.interval INTERSECTS b.interval
Left Outer Join
SELECT a.*, b.name
FROM table_a a
LEFT JOIN table_b b ON a.interval INTERSECTS b.interval
Exclusion (NOT IN)
SELECT a.*
FROM table_a a
LEFT JOIN table_b b ON a.interval INTERSECTS b.interval
WHERE b.chromosome IS NULL
Count Overlaps
SELECT a.*, COUNT(b.name) AS overlap_count
FROM table_a a
LEFT JOIN table_b b ON a.interval INTERSECTS b.interval
GROUP BY a.chromosome, a.start_pos, a.end_pos, ...
K-Nearest Neighbors
SELECT source.*, nearest.name, nearest.distance
FROM source
CROSS JOIN LATERAL NEAREST(target, reference=source.interval, k=5) AS nearest
Clustering
SELECT *, CLUSTER(interval) AS cluster_id
FROM table
ORDER BY chromosome, start_pos
Merging
SELECT MERGE(interval), COUNT(*) AS count
FROM table
Engine Methods
execute()
Execute a GIQL query and return a cursor.
cursor = engine.execute("SELECT * FROM table WHERE interval INTERSECTS 'chr1:1000-2000'")
transpile()
Convert GIQL to SQL without executing.
sql = engine.transpile("SELECT * FROM table WHERE interval INTERSECTS 'chr1:1000-2000'")
register_table_schema()
Register a table’s schema for genomic operations.
engine.register_table_schema(
"table_name",
{
"chromosome": "VARCHAR",
"start_pos": "BIGINT",
"end_pos": "BIGINT",
"name": "VARCHAR",
},
genomic_column="interval",
chromosome_column="chromosome", # optional, default: "chromosome"
start_column="start_pos", # optional, default: "start_pos"
end_column="end_pos", # optional, default: "end_pos"
)
load_csv()
Load a CSV file into a table.
engine.load_csv("table_name", "file.csv")
engine.load_csv("table_name", "file.tsv", delimiter="\t")