Distance and Neighbors#
This section covers patterns for calculating genomic distances and finding nearest features using GIQL’s distance operators.
Calculating Distances#
Distance Between Feature Pairs#
Calculate the distance between features in two tables:
SELECT
a.name AS feature_a,
b.name AS feature_b,
DISTANCE(a.interval, b.interval) AS distance
FROM features_a a
CROSS JOIN features_b b
WHERE a.chrom = b.chrom
ORDER BY a.name, distance
Use case: Generate a distance matrix between regulatory elements and genes.
Note
Always include WHERE a.chrom = b.chrom to avoid comparing
features on different chromosomes (which returns NULL for distance).
Identify Overlapping vs Proximal#
Classify relationships based on distance:
SELECT
p.name AS peak,
g.name AS gene,
DISTANCE(p.interval, g.interval) AS dist,
CASE
WHEN DISTANCE(p.interval, g.interval) = 0 THEN 'overlapping'
WHEN DISTANCE(p.interval, g.interval) <= 1000 THEN 'proximal (<1kb)'
WHEN DISTANCE(p.interval, g.interval) <= 10000 THEN 'nearby (<10kb)'
ELSE 'distant'
END AS relationship
FROM peaks p
CROSS JOIN genes g
WHERE p.chrom = g.chrom
Use case: Categorize peak-gene relationships for enhancer analysis.
Filter by Maximum Distance#
Find feature pairs within a distance threshold:
SELECT
a.name,
b.name,
DISTANCE(a.interval, b.interval) AS dist
FROM features_a a
CROSS JOIN features_b b
WHERE a.chrom = b.chrom
AND DISTANCE(a.interval, b.interval) <= 50000
ORDER BY dist
Use case: Find regulatory elements within 50kb of genes.
K-Nearest Neighbor Queries#
Find K Nearest Features#
For each peak, find the 3 nearest genes:
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 3) AS nearest
ORDER BY peaks.name, nearest.distance
Use case: Annotate ChIP-seq peaks with nearby genes.
Nearest Feature to a Specific Location#
Find the 5 nearest genes to a specific genomic coordinate:
SELECT name, distance
FROM NEAREST(genes, reference := 'chr1:1000000-1001000', k := 5)
ORDER BY distance
Use case: Explore the genomic neighborhood of a position of interest.
Nearest with Distance Constraint#
Find nearest features within a maximum distance:
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference := peaks.interval,
k := 5,
max_distance := 100000
) AS nearest
ORDER BY peaks.name, nearest.distance
Use case: Find regulatory targets within 100kb, ignoring distant genes.
Strand-Specific Queries#
Same-Strand Nearest Neighbors#
Find nearest features on the same strand only:
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.strand,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference := peaks.interval,
k := 3,
stranded := true
) AS nearest
ORDER BY peaks.name, nearest.distance
Use case: Find same-strand genes for strand-specific regulatory analysis.
Directional Queries#
Upstream Features#
Find features upstream (5’) of reference positions using signed distances:
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference := peaks.interval,
k := 10,
signed := true
) AS nearest
WHERE nearest.distance < 0
ORDER BY peaks.name, nearest.distance DESC
Use case: Find genes upstream of regulatory elements.
Note
With signed := true, negative distances indicate upstream features
and positive distances indicate downstream features.
Downstream Features#
Find features downstream (3’) of reference positions:
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference := peaks.interval,
k := 10,
signed := true
) AS nearest
WHERE nearest.distance > 0
ORDER BY peaks.name, nearest.distance
Use case: Identify downstream targets of promoter elements.
Promoter-Proximal Analysis#
Find features within a specific distance window around the reference:
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference := peaks.interval,
k := 10,
signed := true
) AS nearest
WHERE nearest.distance BETWEEN -2000 AND 500
ORDER BY peaks.name, ABS(nearest.distance)
Use case: Find genes with peaks in their promoter regions (-2kb to +500bp from TSS).
Combined Parameters#
Strand-Specific with Distance Constraint#
Find nearby same-strand features:
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference := peaks.interval,
k := 5,
max_distance := 50000,
stranded := true,
signed := true
) AS nearest
WHERE nearest.distance BETWEEN -10000 AND 10000
ORDER BY peaks.name, ABS(nearest.distance)
Use case: Find same-strand genes within ±10kb for promoter-enhancer analysis.
Distance Statistics#
Average Distance to Nearest Gene#
Calculate the average distance from peaks to their nearest gene:
WITH nearest_genes AS (
SELECT
peaks.name AS peak,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 1) AS nearest
)
SELECT
COUNT(*) AS peak_count,
AVG(distance) AS avg_distance,
MIN(distance) AS min_distance,
MAX(distance) AS max_distance
FROM nearest_genes
Use case: Characterize the genomic distribution of peaks relative to genes.
Distance Distribution by Chromosome#
Analyze distance patterns per chromosome:
WITH nearest_genes AS (
SELECT
peaks.chrom,
peaks.name AS peak,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 1) AS nearest
)
SELECT
chrom,
COUNT(*) AS peak_count,
AVG(distance) AS avg_distance
FROM nearest_genes
GROUP BY chrom
ORDER BY chrom
Use case: Compare regulatory element distribution across chromosomes.
Window Expansion Patterns#
Expand Search Window#
Find features within an expanded window around each feature:
WITH expanded AS (
SELECT
name,
chrom,
start - 5000 AS search_start,
end + 5000 AS search_end
FROM peaks
)
SELECT
e.name AS peak,
b.*
FROM expanded e
JOIN features_b b
ON b.chrom = e.chrom
AND b.start < e.search_end
AND b.end > e.search_start
Use case: Find all features within 5kb flanking regions.
Note
This pattern uses raw coordinate manipulation rather than the NEAREST operator, which is useful when you need custom window shapes.