Distance and Proximity Queries

This section covers patterns for calculating genomic distances and finding nearest features using GIQL’s distance operators.

Calculating Distances

Distance Between Feature Pairs

Calculate the distance between features in two tables:

cursor = engine.execute("""
    SELECT
        a.name AS feature_a,
        b.name AS feature_b,
        DISTANCE(a.interval, b.interval) AS distance
    FROM features_a a
    CROSS JOIN features_b b
    WHERE a.chromosome = b.chromosome
    ORDER BY a.name, distance
""")

Use case: Generate a distance matrix between regulatory elements and genes.

Note

Always include WHERE a.chromosome = b.chromosome to avoid comparing features on different chromosomes (which returns NULL for distance).

Identify Overlapping vs Proximal

Classify relationships based on distance:

cursor = engine.execute("""
    SELECT
        p.name AS peak,
        g.name AS gene,
        DISTANCE(p.interval, g.interval) AS dist,
        CASE
            WHEN DISTANCE(p.interval, g.interval) = 0 THEN 'overlapping'
            WHEN DISTANCE(p.interval, g.interval) <= 1000 THEN 'proximal (<1kb)'
            WHEN DISTANCE(p.interval, g.interval) <= 10000 THEN 'nearby (<10kb)'
            ELSE 'distant'
        END AS relationship
    FROM peaks p
    CROSS JOIN genes g
    WHERE p.chromosome = g.chromosome
""")

Use case: Categorize peak-gene relationships for enhancer analysis.

Filter by Maximum Distance

Find feature pairs within a distance threshold:

cursor = engine.execute("""
    SELECT
        a.name,
        b.name,
        DISTANCE(a.interval, b.interval) AS dist
    FROM features_a a
    CROSS JOIN features_b b
    WHERE a.chromosome = b.chromosome
      AND DISTANCE(a.interval, b.interval) <= 50000
    ORDER BY dist
""")

Use case: Find regulatory elements within 50kb of genes.

K-Nearest Neighbor Queries

Find K Nearest Features

For each peak, find the 3 nearest genes:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=3) AS nearest
    ORDER BY peaks.name, nearest.distance
""")

Use case: Annotate ChIP-seq peaks with nearby genes.

Nearest Feature to a Specific Location

Find the 5 nearest genes to a specific genomic coordinate:

cursor = engine.execute("""
    SELECT name, distance
    FROM NEAREST(genes, reference='chr1:1000000-1001000', k=5)
    ORDER BY distance
""")

Use case: Explore the genomic neighborhood of a position of interest.

Nearest with Distance Constraint

Find nearest features within a maximum distance:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=5,
        max_distance=100000
    ) AS nearest
    ORDER BY peaks.name, nearest.distance
""")

Use case: Find regulatory targets within 100kb, ignoring distant genes.

Strand-Specific Queries

Same-Strand Nearest Neighbors

Find nearest features on the same strand only:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.strand,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=3,
        stranded=true
    ) AS nearest
    ORDER BY peaks.name, nearest.distance
""")

Use case: Find same-strand genes for strand-specific regulatory analysis.

Directional Queries

Upstream Features

Find features upstream (5’) of reference positions using signed distances:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=10,
        signed=true
    ) AS nearest
    WHERE nearest.distance < 0
    ORDER BY peaks.name, nearest.distance DESC
""")

Use case: Find genes upstream of regulatory elements.

Note

With signed=true, negative distances indicate upstream features and positive distances indicate downstream features.

Downstream Features

Find features downstream (3’) of reference positions:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=10,
        signed=true
    ) AS nearest
    WHERE nearest.distance > 0
    ORDER BY peaks.name, nearest.distance
""")

Use case: Identify downstream targets of promoter elements.

Promoter-Proximal Analysis

Find features within a specific distance window around the reference:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=10,
        signed=true
    ) AS nearest
    WHERE nearest.distance BETWEEN -2000 AND 500
    ORDER BY peaks.name, ABS(nearest.distance)
""")

Use case: Find genes with peaks in their promoter regions (-2kb to +500bp from TSS).

Combined Parameters

Strand-Specific with Distance Constraint

Find nearby same-strand features:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=5,
        max_distance=50000,
        stranded=true,
        signed=true
    ) AS nearest
    WHERE nearest.distance BETWEEN -10000 AND 10000
    ORDER BY peaks.name, ABS(nearest.distance)
""")

Use case: Find same-strand genes within ±10kb for promoter-enhancer analysis.

Distance Statistics

Average Distance to Nearest Gene

Calculate the average distance from peaks to their nearest gene:

cursor = engine.execute("""
    WITH nearest_genes AS (
        SELECT
            peaks.name AS peak,
            nearest.distance
        FROM peaks
        CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
    )
    SELECT
        COUNT(*) AS peak_count,
        AVG(distance) AS avg_distance,
        MIN(distance) AS min_distance,
        MAX(distance) AS max_distance
    FROM nearest_genes
""")

Use case: Characterize the genomic distribution of peaks relative to genes.

Distance Distribution by Chromosome

Analyze distance patterns per chromosome:

cursor = engine.execute("""
    WITH nearest_genes AS (
        SELECT
            peaks.chromosome,
            peaks.name AS peak,
            nearest.distance
        FROM peaks
        CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
    )
    SELECT
        chromosome,
        COUNT(*) AS peak_count,
        AVG(distance) AS avg_distance
    FROM nearest_genes
    GROUP BY chromosome
    ORDER BY chromosome
""")

Use case: Compare regulatory element distribution across chromosomes.

Window Expansion Patterns

Expand Search Window

Find features within an expanded window around each feature:

cursor = engine.execute("""
    WITH expanded AS (
        SELECT
            name,
            chromosome,
            start_pos - 5000 AS search_start,
            end_pos + 5000 AS search_end
        FROM peaks
    )
    SELECT
        e.name AS peak,
        b.*
    FROM expanded e
    JOIN features_b b
        ON b.chromosome = e.chromosome
        AND b.start_pos < e.search_end
        AND b.end_pos > e.search_start
""")

Use case: Find all features within 5kb flanking regions.

Note

This pattern uses raw coordinate manipulation rather than the NEAREST operator, which is useful when you need custom window shapes.