Distance and Proximity Queries

This section covers patterns for calculating genomic distances and finding nearest features using GIQL’s distance operators.

Calculating Distances 

Distance Between Feature Pairs 

Calculate the distance between features in two tables:

cursor = engine.execute("""
    SELECT
        a.name AS feature_a,
        b.name AS feature_b,
        DISTANCE(a.interval, b.interval) AS distance
    FROM features_a a
    CROSS JOIN features_b b
    WHERE a.chromosome = b.chromosome
    ORDER BY a.name, distance
""")

Use case: Generate a distance matrix between regulatory elements and genes.

Note

Always include WHERE a.chromosome = b.chromosome to avoid comparing features on different chromosomes (which returns NULL for distance).

Identify Overlapping vs Proximal 

Classify relationships based on distance:

cursor = engine.execute("""
    SELECT
        p.name AS peak,
        g.name AS gene,
        DISTANCE(p.interval, g.interval) AS dist,
        CASE
            WHEN DISTANCE(p.interval, g.interval) = 0 THEN 'overlapping'
            WHEN DISTANCE(p.interval, g.interval) <= 1000 THEN 'proximal (<1kb)'
            WHEN DISTANCE(p.interval, g.interval) <= 10000 THEN 'nearby (<10kb)'
            ELSE 'distant'
        END AS relationship
    FROM peaks p
    CROSS JOIN genes g
    WHERE p.chromosome = g.chromosome
""")

Use case: Categorize peak-gene relationships for enhancer analysis.

Filter by Maximum Distance 

Find feature pairs within a distance threshold:

cursor = engine.execute("""
    SELECT
        a.name,
        b.name,
        DISTANCE(a.interval, b.interval) AS dist
    FROM features_a a
    CROSS JOIN features_b b
    WHERE a.chromosome = b.chromosome
      AND DISTANCE(a.interval, b.interval) <= 50000
    ORDER BY dist
""")

Use case: Find regulatory elements within 50kb of genes.

K-Nearest Neighbor Queries 

Find K Nearest Features 

For each peak, find the 3 nearest genes:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=3) AS nearest
    ORDER BY peaks.name, nearest.distance
""")

Use case: Annotate ChIP-seq peaks with nearby genes.

Nearest Feature to a Specific Location 

Find the 5 nearest genes to a specific genomic coordinate:

cursor = engine.execute("""
    SELECT name, distance
    FROM NEAREST(genes, reference='chr1:1000000-1001000', k=5)
    ORDER BY distance
""")

Use case: Explore the genomic neighborhood of a position of interest.

Nearest with Distance Constraint 

Find nearest features within a maximum distance:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=5,
        max_distance=100000
    ) AS nearest
    ORDER BY peaks.name, nearest.distance
""")

Use case: Find regulatory targets within 100kb, ignoring distant genes.

Strand-Specific Queries 

Same-Strand Nearest Neighbors 

Find nearest features on the same strand only:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.strand,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=3,
        stranded=true
    ) AS nearest
    ORDER BY peaks.name, nearest.distance
""")

Use case: Find same-strand genes for strand-specific regulatory analysis.

Directional Queries 

Upstream Features 

Find features upstream (5’) of reference positions using signed distances:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=10,
        signed=true
    ) AS nearest
    WHERE nearest.distance < 0
    ORDER BY peaks.name, nearest.distance DESC
""")

Use case: Find genes upstream of regulatory elements.

Note

With signed=true, negative distances indicate upstream features and positive distances indicate downstream features.

Downstream Features 

Find features downstream (3’) of reference positions:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=10,
        signed=true
    ) AS nearest
    WHERE nearest.distance > 0
    ORDER BY peaks.name, nearest.distance
""")

Use case: Identify downstream targets of promoter elements.

Promoter-Proximal Analysis 

Find features within a specific distance window around the reference:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=10,
        signed=true
    ) AS nearest
    WHERE nearest.distance BETWEEN -2000 AND 500
    ORDER BY peaks.name, ABS(nearest.distance)
""")

Use case: Find genes with peaks in their promoter regions (-2kb to +500bp from TSS).

Combined Parameters 

Strand-Specific with Distance Constraint 

Find nearby same-strand features:

cursor = engine.execute("""
    SELECT
        peaks.name AS peak,
        nearest.name AS gene,
        nearest.distance
    FROM peaks
    CROSS JOIN LATERAL NEAREST(
        genes,
        reference=peaks.interval,
        k=5,
        max_distance=50000,
        stranded=true,
        signed=true
    ) AS nearest
    WHERE nearest.distance BETWEEN -10000 AND 10000
    ORDER BY peaks.name, ABS(nearest.distance)
""")

Use case: Find same-strand genes within ±10kb for promoter-enhancer analysis.

Distance Statistics 

Average Distance to Nearest Gene 

Calculate the average distance from peaks to their nearest gene:

cursor = engine.execute("""
    WITH nearest_genes AS (
        SELECT
            peaks.name AS peak,
            nearest.distance
        FROM peaks
        CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
    )
    SELECT
        COUNT(*) AS peak_count,
        AVG(distance) AS avg_distance,
        MIN(distance) AS min_distance,
        MAX(distance) AS max_distance
    FROM nearest_genes
""")

Use case: Characterize the genomic distribution of peaks relative to genes.

Distance Distribution by Chromosome 

Analyze distance patterns per chromosome:

cursor = engine.execute("""
    WITH nearest_genes AS (
        SELECT
            peaks.chromosome,
            peaks.name AS peak,
            nearest.distance
        FROM peaks
        CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
    )
    SELECT
        chromosome,
        COUNT(*) AS peak_count,
        AVG(distance) AS avg_distance
    FROM nearest_genes
    GROUP BY chromosome
    ORDER BY chromosome
""")

Use case: Compare regulatory element distribution across chromosomes.

Window Expansion Patterns 

Expand Search Window 

Find features within an expanded window around each feature:

cursor = engine.execute("""
    WITH expanded AS (
        SELECT
            name,
            chromosome,
            start_pos - 5000 AS search_start,
            end_pos + 5000 AS search_end
        FROM peaks
    )
    SELECT
        e.name AS peak,
        b.*
    FROM expanded e
    JOIN features_b b
        ON b.chromosome = e.chromosome
        AND b.start_pos < e.search_end
        AND b.end_pos > e.search_start
""")

Use case: Find all features within 5kb flanking regions.

Note

This pattern uses raw coordinate manipulation rather than the NEAREST operator, which is useful when you need custom window shapes.