Distance and Neighbors#

This section covers patterns for calculating genomic distances and finding nearest features using GIQL’s distance operators.

Calculating Distances#

Distance Between Feature Pairs#

Calculate the distance between features in two tables:

SELECT
    a.name AS feature_a,
    b.name AS feature_b,
    DISTANCE(a.interval, b.interval) AS distance
FROM features_a a
CROSS JOIN features_b b
WHERE a.chrom = b.chrom
ORDER BY a.name, distance

Use case: Generate a distance matrix between regulatory elements and genes.

Note

Always include WHERE a.chrom = b.chrom to avoid comparing features on different chromosomes (which returns NULL for distance).

Identify Overlapping vs Proximal#

Classify relationships based on distance:

SELECT
    p.name AS peak,
    g.name AS gene,
    DISTANCE(p.interval, g.interval) AS dist,
    CASE
        WHEN DISTANCE(p.interval, g.interval) = 0 THEN 'overlapping'
        WHEN DISTANCE(p.interval, g.interval) <= 1000 THEN 'proximal (<1kb)'
        WHEN DISTANCE(p.interval, g.interval) <= 10000 THEN 'nearby (<10kb)'
        ELSE 'distant'
    END AS relationship
FROM peaks p
CROSS JOIN genes g
WHERE p.chrom = g.chrom

Use case: Categorize peak-gene relationships for enhancer analysis.

Filter by Maximum Distance#

Find feature pairs within a distance threshold:

SELECT
    a.name,
    b.name,
    DISTANCE(a.interval, b.interval) AS dist
FROM features_a a
CROSS JOIN features_b b
WHERE a.chrom = b.chrom
  AND DISTANCE(a.interval, b.interval) <= 50000
ORDER BY dist

Use case: Find regulatory elements within 50kb of genes.

K-Nearest Neighbor Queries#

Find K Nearest Features#

For each peak, find the 3 nearest genes:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 3) AS nearest
ORDER BY peaks.name, nearest.distance

Use case: Annotate ChIP-seq peaks with nearby genes.

Nearest Feature to a Specific Location#

Find the 5 nearest genes to a specific genomic coordinate:

SELECT name, distance
FROM NEAREST(genes, reference := 'chr1:1000000-1001000', k := 5)
ORDER BY distance

Use case: Explore the genomic neighborhood of a position of interest.

Nearest with Distance Constraint#

Find nearest features within a maximum distance:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 5,
    max_distance := 100000
) AS nearest
ORDER BY peaks.name, nearest.distance

Use case: Find regulatory targets within 100kb, ignoring distant genes.

Strand-Specific Queries#

Same-Strand Nearest Neighbors#

Find nearest features on the same strand only:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.strand,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 3,
    stranded := true
) AS nearest
ORDER BY peaks.name, nearest.distance

Use case: Find same-strand genes for strand-specific regulatory analysis.

Directional Queries#

Upstream Features#

Find features upstream (5’) of reference positions using signed distances:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 10,
    signed := true
) AS nearest
WHERE nearest.distance < 0
ORDER BY peaks.name, nearest.distance DESC

Use case: Find genes upstream of regulatory elements.

Note

With signed := true, negative distances indicate upstream features and positive distances indicate downstream features.

Downstream Features#

Find features downstream (3’) of reference positions:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 10,
    signed := true
) AS nearest
WHERE nearest.distance > 0
ORDER BY peaks.name, nearest.distance

Use case: Identify downstream targets of promoter elements.

Promoter-Proximal Analysis#

Find features within a specific distance window around the reference:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 10,
    signed := true
) AS nearest
WHERE nearest.distance BETWEEN -2000 AND 500
ORDER BY peaks.name, ABS(nearest.distance)

Use case: Find genes with peaks in their promoter regions (-2kb to +500bp from TSS).

Combined Parameters#

Strand-Specific with Distance Constraint#

Find nearby same-strand features:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 5,
    max_distance := 50000,
    stranded := true,
    signed := true
) AS nearest
WHERE nearest.distance BETWEEN -10000 AND 10000
ORDER BY peaks.name, ABS(nearest.distance)

Use case: Find same-strand genes within ±10kb for promoter-enhancer analysis.

Distance Statistics#

Average Distance to Nearest Gene#

Calculate the average distance from peaks to their nearest gene:

WITH nearest_genes AS (
    SELECT
        peaks.name AS peak,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 1) AS nearest
)
SELECT
    COUNT(*) AS peak_count,
    AVG(distance) AS avg_distance,
    MIN(distance) AS min_distance,
    MAX(distance) AS max_distance
FROM nearest_genes

Use case: Characterize the genomic distribution of peaks relative to genes.

Distance Distribution by Chromosome#

Analyze distance patterns per chromosome:

WITH nearest_genes AS (
    SELECT
        peaks.chrom,
        peaks.name AS peak,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 1) AS nearest
)
SELECT
    chrom,
    COUNT(*) AS peak_count,
    AVG(distance) AS avg_distance
FROM nearest_genes
GROUP BY chrom
ORDER BY chrom

Use case: Compare regulatory element distribution across chromosomes.

Window Expansion Patterns#

Expand Search Window#

Find features within an expanded window around each feature:

WITH expanded AS (
    SELECT
        name,
        chrom,
        start - 5000 AS search_start,
        end + 5000 AS search_end
    FROM peaks
)
SELECT
    e.name AS peak,
    b.*
FROM expanded e
JOIN features_b b
    ON b.chrom = e.chrom
    AND b.start < e.search_end
    AND b.end > e.search_start

Use case: Find all features within 5kb flanking regions.

Note

This pattern uses raw coordinate manipulation rather than the NEAREST operator, which is useful when you need custom window shapes.