Distance and Proximity Queries
This section covers patterns for calculating genomic distances and finding nearest features using GIQL’s distance operators.
Calculating Distances
Distance Between Feature Pairs
Calculate the distance between features in two tables:
cursor = engine.execute("""
SELECT
a.name AS feature_a,
b.name AS feature_b,
DISTANCE(a.interval, b.interval) AS distance
FROM features_a a
CROSS JOIN features_b b
WHERE a.chromosome = b.chromosome
ORDER BY a.name, distance
""")
Use case: Generate a distance matrix between regulatory elements and genes.
Note
Always include WHERE a.chromosome = b.chromosome to avoid comparing
features on different chromosomes (which returns NULL for distance).
Identify Overlapping vs Proximal
Classify relationships based on distance:
cursor = engine.execute("""
SELECT
p.name AS peak,
g.name AS gene,
DISTANCE(p.interval, g.interval) AS dist,
CASE
WHEN DISTANCE(p.interval, g.interval) = 0 THEN 'overlapping'
WHEN DISTANCE(p.interval, g.interval) <= 1000 THEN 'proximal (<1kb)'
WHEN DISTANCE(p.interval, g.interval) <= 10000 THEN 'nearby (<10kb)'
ELSE 'distant'
END AS relationship
FROM peaks p
CROSS JOIN genes g
WHERE p.chromosome = g.chromosome
""")
Use case: Categorize peak-gene relationships for enhancer analysis.
Filter by Maximum Distance
Find feature pairs within a distance threshold:
cursor = engine.execute("""
SELECT
a.name,
b.name,
DISTANCE(a.interval, b.interval) AS dist
FROM features_a a
CROSS JOIN features_b b
WHERE a.chromosome = b.chromosome
AND DISTANCE(a.interval, b.interval) <= 50000
ORDER BY dist
""")
Use case: Find regulatory elements within 50kb of genes.
K-Nearest Neighbor Queries
Find K Nearest Features
For each peak, find the 3 nearest genes:
cursor = engine.execute("""
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=3) AS nearest
ORDER BY peaks.name, nearest.distance
""")
Use case: Annotate ChIP-seq peaks with nearby genes.
Nearest Feature to a Specific Location
Find the 5 nearest genes to a specific genomic coordinate:
cursor = engine.execute("""
SELECT name, distance
FROM NEAREST(genes, reference='chr1:1000000-1001000', k=5)
ORDER BY distance
""")
Use case: Explore the genomic neighborhood of a position of interest.
Nearest with Distance Constraint
Find nearest features within a maximum distance:
cursor = engine.execute("""
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference=peaks.interval,
k=5,
max_distance=100000
) AS nearest
ORDER BY peaks.name, nearest.distance
""")
Use case: Find regulatory targets within 100kb, ignoring distant genes.
Strand-Specific Queries
Same-Strand Nearest Neighbors
Find nearest features on the same strand only:
cursor = engine.execute("""
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.strand,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference=peaks.interval,
k=3,
stranded=true
) AS nearest
ORDER BY peaks.name, nearest.distance
""")
Use case: Find same-strand genes for strand-specific regulatory analysis.
Directional Queries
Upstream Features
Find features upstream (5’) of reference positions using signed distances:
cursor = engine.execute("""
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference=peaks.interval,
k=10,
signed=true
) AS nearest
WHERE nearest.distance < 0
ORDER BY peaks.name, nearest.distance DESC
""")
Use case: Find genes upstream of regulatory elements.
Note
With signed=true, negative distances indicate upstream features
and positive distances indicate downstream features.
Downstream Features
Find features downstream (3’) of reference positions:
cursor = engine.execute("""
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference=peaks.interval,
k=10,
signed=true
) AS nearest
WHERE nearest.distance > 0
ORDER BY peaks.name, nearest.distance
""")
Use case: Identify downstream targets of promoter elements.
Promoter-Proximal Analysis
Find features within a specific distance window around the reference:
cursor = engine.execute("""
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference=peaks.interval,
k=10,
signed=true
) AS nearest
WHERE nearest.distance BETWEEN -2000 AND 500
ORDER BY peaks.name, ABS(nearest.distance)
""")
Use case: Find genes with peaks in their promoter regions (-2kb to +500bp from TSS).
Combined Parameters
Strand-Specific with Distance Constraint
Find nearby same-strand features:
cursor = engine.execute("""
SELECT
peaks.name AS peak,
nearest.name AS gene,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
genes,
reference=peaks.interval,
k=5,
max_distance=50000,
stranded=true,
signed=true
) AS nearest
WHERE nearest.distance BETWEEN -10000 AND 10000
ORDER BY peaks.name, ABS(nearest.distance)
""")
Use case: Find same-strand genes within ±10kb for promoter-enhancer analysis.
Distance Statistics
Average Distance to Nearest Gene
Calculate the average distance from peaks to their nearest gene:
cursor = engine.execute("""
WITH nearest_genes AS (
SELECT
peaks.name AS peak,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
)
SELECT
COUNT(*) AS peak_count,
AVG(distance) AS avg_distance,
MIN(distance) AS min_distance,
MAX(distance) AS max_distance
FROM nearest_genes
""")
Use case: Characterize the genomic distribution of peaks relative to genes.
Distance Distribution by Chromosome
Analyze distance patterns per chromosome:
cursor = engine.execute("""
WITH nearest_genes AS (
SELECT
peaks.chromosome,
peaks.name AS peak,
nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
)
SELECT
chromosome,
COUNT(*) AS peak_count,
AVG(distance) AS avg_distance
FROM nearest_genes
GROUP BY chromosome
ORDER BY chromosome
""")
Use case: Compare regulatory element distribution across chromosomes.
Window Expansion Patterns
Expand Search Window
Find features within an expanded window around each feature:
cursor = engine.execute("""
WITH expanded AS (
SELECT
name,
chromosome,
start_pos - 5000 AS search_start,
end_pos + 5000 AS search_end
FROM peaks
)
SELECT
e.name AS peak,
b.*
FROM expanded e
JOIN features_b b
ON b.chromosome = e.chromosome
AND b.start_pos < e.search_end
AND b.end_pos > e.search_start
""")
Use case: Find all features within 5kb flanking regions.
Note
This pattern uses raw coordinate manipulation rather than the NEAREST operator, which is useful when you need custom window shapes.