Distance and Neighbors

Distance and Neighbors#

This section covers patterns for calculating genomic distances and finding nearest features using GIQL’s distance operators.

Calculating Distances#

Distance Between Feature Pairs#

Calculate the distance between features in two tables:

SELECT
    a.name AS feature_a,
    b.name AS feature_b,
    DISTANCE(a.interval, b.interval) AS distance
FROM features_a a
CROSS JOIN features_b b
WHERE a.chrom = b.chrom
ORDER BY a.name, distance

Use case: Generate a distance matrix between regulatory elements and genes.

Note

Always include WHERE a.chrom = b.chrom to avoid comparing features on different chromosomes (which returns NULL for distance).

Identify Overlapping vs Proximal#

Classify relationships based on distance:

SELECT
    p.name AS peak,
    g.name AS gene,
    DISTANCE(p.interval, g.interval) AS dist,
    CASE
        WHEN DISTANCE(p.interval, g.interval) = 0 THEN 'overlapping'
        WHEN DISTANCE(p.interval, g.interval) <= 1000 THEN 'proximal (<1kb)'
        WHEN DISTANCE(p.interval, g.interval) <= 10000 THEN 'nearby (<10kb)'
        ELSE 'distant'
    END AS relationship
FROM peaks p
CROSS JOIN genes g
WHERE p.chrom = g.chrom

Use case: Categorize peak-gene relationships for enhancer analysis.

Filter by Maximum Distance#

Find feature pairs within a distance threshold:

SELECT
    a.name,
    b.name,
    DISTANCE(a.interval, b.interval) AS dist
FROM features_a a
CROSS JOIN features_b b
WHERE a.chrom = b.chrom
  AND DISTANCE(a.interval, b.interval) <= 50000
ORDER BY dist

Use case: Find regulatory elements within 50kb of genes.

K-Nearest Neighbor Queries#

Find K Nearest Features#

For each peak, find the 3 nearest genes:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 3) AS nearest
ORDER BY peaks.name, nearest.distance

Use case: Annotate ChIP-seq peaks with nearby genes.

Nearest Feature to a Specific Location#

Find the 5 nearest genes to a specific genomic coordinate:

SELECT name, distance
FROM NEAREST(genes, reference := 'chr1:1000000-1001000', k := 5)
ORDER BY distance

Use case: Explore the genomic neighborhood of a position of interest.

Nearest with Distance Constraint#

Find nearest features within a maximum distance:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 5,
    max_distance := 100000
) AS nearest
ORDER BY peaks.name, nearest.distance

Use case: Find regulatory targets within 100kb, ignoring distant genes.

Strand-Specific Queries#

Same-Strand Nearest Neighbors#

Find nearest features on the same strand only:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.strand,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 3,
    stranded := true
) AS nearest
ORDER BY peaks.name, nearest.distance

Use case: Find same-strand genes for strand-specific regulatory analysis.

Directional Queries#

Upstream Features#

Find features upstream (5’) of reference positions using signed distances:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 10,
    signed := true
) AS nearest
WHERE nearest.distance < 0
ORDER BY peaks.name, nearest.distance DESC

Use case: Find genes upstream of regulatory elements.

Note

With signed := true, negative distances indicate upstream features and positive distances indicate downstream features.

Downstream Features#

Find features downstream (3’) of reference positions:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 10,
    signed := true
) AS nearest
WHERE nearest.distance > 0
ORDER BY peaks.name, nearest.distance

Use case: Identify downstream targets of promoter elements.

Promoter-Proximal Analysis#

Find features within a specific distance window around the reference:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 10,
    signed := true
) AS nearest
WHERE nearest.distance BETWEEN -2000 AND 500
ORDER BY peaks.name, ABS(nearest.distance)

Use case: Find genes with peaks in their promoter regions (-2kb to +500bp from TSS).

Combined Parameters#

Strand-Specific with Distance Constraint#

Find nearby same-strand features:

SELECT
    peaks.name AS peak,
    nearest.name AS gene,
    nearest.distance
FROM peaks
CROSS JOIN LATERAL NEAREST(
    genes,
    reference := peaks.interval,
    k := 5,
    max_distance := 50000,
    stranded := true,
    signed := true
) AS nearest
WHERE nearest.distance BETWEEN -10000 AND 10000
ORDER BY peaks.name, ABS(nearest.distance)

Use case: Find same-strand genes within ±10kb for promoter-enhancer analysis.

Distance Statistics#

Average Distance to Nearest Gene#

Calculate the average distance from peaks to their nearest gene:

WITH nearest_genes AS (
    SELECT
        peaks.name AS peak,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 1) AS nearest
)
SELECT
    COUNT(*) AS peak_count,
    AVG(distance) AS avg_distance,
    MIN(distance) AS min_distance,
    MAX(distance) AS max_distance
FROM nearest_genes

Use case: Characterize the genomic distribution of peaks relative to genes.

Distance Distribution by Chromosome#

Analyze distance patterns per chromosome:

WITH nearest_genes AS (
    SELECT
        peaks.chrom,
        peaks.name AS peak,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(genes, reference := peaks.interval, k := 1) AS nearest
)
SELECT
    chrom,
    COUNT(*) AS peak_count,
    AVG(distance) AS avg_distance
FROM nearest_genes
GROUP BY chrom
ORDER BY chrom

Use case: Compare regulatory element distribution across chromosomes.

Window Expansion Patterns#

Expand Search Window#

Find features within an expanded window around each feature:

WITH expanded AS (
    SELECT
        name,
        chrom,
        start - 5000 AS search_start,
        end + 5000 AS search_end
    FROM peaks
)
SELECT
    e.name AS peak,
    b.*
FROM expanded e
JOIN features_b b
    ON b.chrom = e.chrom
    AND b.start < e.search_end
    AND b.end > e.search_start

Use case: Find all features within 5kb flanking regions.

Note

This pattern uses raw coordinate manipulation rather than the NEAREST operator, which is useful when you need custom window shapes.

Distance and Neighbors

Contents

Distance and Neighbors#

Calculating Distances#

Distance Between Feature Pairs#

Identify Overlapping vs Proximal#

Filter by Maximum Distance#

K-Nearest Neighbor Queries#

Find K Nearest Features#

Nearest Feature to a Specific Location#

Nearest with Distance Constraint#

Strand-Specific Queries#

Same-Strand Nearest Neighbors#

Directional Queries#

Upstream Features#

Downstream Features#

Promoter-Proximal Analysis#

Combined Parameters#

Strand-Specific with Distance Constraint#

Distance Statistics#

Average Distance to Nearest Gene#

Distance Distribution by Chromosome#

Window Expansion Patterns#

Expand Search Window#