Intersection Queries

This section covers common patterns for finding overlapping genomic features using GIQL’s spatial operators.

Finding Overlapping Features

Basic Overlap Query

Find all features in table A that overlap with any feature in table B:

cursor = engine.execute("""
    SELECT DISTINCT a.*
    FROM features_a a, features_b b
    WHERE a.interval INTERSECTS b.interval
""")

Use case: Identify variants that fall within gene regions.

Get All Overlap Pairs

Return every pair of overlapping features (may produce duplicates if one feature overlaps multiple others):

cursor = engine.execute("""
    SELECT a.*, b.*
    FROM features_a a, features_b b
    WHERE a.interval INTERSECTS b.interval
""")

Use case: Generate a full overlap matrix for downstream analysis.

Query Against a Specific Region

Find features overlapping a literal genomic range:

cursor = engine.execute("""
    SELECT * FROM variants
    WHERE interval INTERSECTS 'chr1:1000000-2000000'
""")

Use case: Extract all data for a specific chromosomal region.

Filtering by Overlap

Excluding Overlaps

Find features in A that do NOT overlap with any feature in B:

cursor = engine.execute("""
    SELECT a.*
    FROM features_a a
    LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
    WHERE b.chromosome IS NULL
""")

Use case: Find regulatory regions that don’t overlap with known genes, or identify variants outside of exonic regions.

Features with Any Overlap (Unique)

Return each feature from A only once, regardless of how many B features it overlaps:

cursor = engine.execute("""
    SELECT DISTINCT a.*
    FROM features_a a
    INNER JOIN features_b b ON a.interval INTERSECTS b.interval
""")

Use case: Get a deduplicated list of features that have at least one overlap.

Counting Overlaps

Count Overlapping Features

Count how many B features each A feature overlaps:

cursor = engine.execute("""
    SELECT a.*, COUNT(b.name) AS overlap_count
    FROM features_a a
    LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
    GROUP BY a.chromosome, a.start_pos, a.end_pos, a.name, a.score, a.strand
""")

Use case: Calculate how many enhancers each gene overlaps with, or count variants per feature.

Filter by Overlap Count

Find features that overlap at least N other features:

cursor = engine.execute("""
    SELECT a.*
    FROM features_a a
    INNER JOIN features_b b ON a.interval INTERSECTS b.interval
    GROUP BY a.chromosome, a.start_pos, a.end_pos, a.name, a.score, a.strand
    HAVING COUNT(*) >= 3
""")

Use case: Identify hotspot regions with high feature density.

Strand-Specific Operations

Same-Strand Overlaps

Find overlapping features on the same strand:

cursor = engine.execute("""
    SELECT a.*, b.name AS b_name
    FROM features_a a, features_b b
    WHERE a.interval INTERSECTS b.interval
      AND a.strand = b.strand
""")

Use case: Find sense-strand overlaps for transcript analysis.

Opposite-Strand Overlaps

Find overlapping features on opposite strands:

cursor = engine.execute("""
    SELECT a.*, b.name AS b_name
    FROM features_a a, features_b b
    WHERE a.interval INTERSECTS b.interval
      AND a.strand != b.strand
      AND a.strand IN ('+', '-')
      AND b.strand IN ('+', '-')
""")

Use case: Identify antisense overlaps or convergent transcription.

Overlap Fraction Requirements

Minimum Overlap Fraction of A

Find overlaps where at least 50% of feature A is covered:

cursor = engine.execute("""
    SELECT a.*
    FROM features_a a, features_b b
    WHERE a.interval INTERSECTS b.interval
      AND (
          LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)
      ) >= 0.5 * (a.end_pos - a.start_pos)
""")

Use case: Ensure substantial overlap rather than just touching edges.

Minimum Overlap Fraction of B

Find overlaps where at least 50% of feature B is covered:

cursor = engine.execute("""
    SELECT a.*
    FROM features_a a, features_b b
    WHERE a.interval INTERSECTS b.interval
      AND (
          LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)
      ) >= 0.5 * (b.end_pos - b.start_pos)
""")

Use case: Find features that substantially cover smaller annotations.

Reciprocal Overlap

Require both features to have at least 50% mutual overlap:

cursor = engine.execute("""
    WITH overlap_calcs AS (
        SELECT
            a.*,
            b.name AS b_name,
            (LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)) AS overlap_bp,
            (a.end_pos - a.start_pos) AS a_length,
            (b.end_pos - b.start_pos) AS b_length
        FROM features_a a, features_b b
        WHERE a.interval INTERSECTS b.interval
    )
    SELECT *
    FROM overlap_calcs
    WHERE overlap_bp >= 0.5 * a_length
      AND overlap_bp >= 0.5 * b_length
""")

Use case: Find high-confidence overlaps where features mutually cover each other.

Join Patterns

Left Outer Join

Report all features from A, with B information where available:

cursor = engine.execute("""
    SELECT a.*, b.name AS overlapping_feature
    FROM features_a a
    LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
""")

Use case: Annotate features with overlap information while keeping all records.

Calculate Overlap Amount

Return the overlap size in base pairs:

cursor = engine.execute("""
    SELECT
        a.*,
        b.name AS b_name,
        (LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)) AS overlap_bp
    FROM features_a a, features_b b
    WHERE a.interval INTERSECTS b.interval
""")

Use case: Quantify the extent of each overlap.

Overlap with NULL Handling

Report overlap amount for all A features, with 0 for non-overlapping:

cursor = engine.execute("""
    SELECT
        a.*,
        b.name AS b_name,
        CASE
            WHEN b.chromosome IS NULL THEN 0
            ELSE LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)
        END AS overlap_bp
    FROM features_a a
    LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
""")

Use case: Create a complete overlap report including non-overlapping features.

Multi-Table Operations

Union Multiple Sources

Intersect A with features from multiple B tables:

# Load and register multiple tables first
engine.load_csv("features_b1", "file1.bed")
engine.load_csv("features_b2", "file2.bed")
engine.load_csv("features_b3", "file3.bed")
# Register schemas for each...

cursor = engine.execute("""
    WITH all_b_features AS (
        SELECT * FROM features_b1
        UNION ALL
        SELECT * FROM features_b2
        UNION ALL
        SELECT * FROM features_b3
    )
    SELECT DISTINCT a.*
    FROM features_a a
    INNER JOIN all_b_features b ON a.interval INTERSECTS b.interval
""")

Use case: Find features overlapping any region from multiple annotation sources.

Track Overlap Source

Know which source table each overlap came from:

cursor = engine.execute("""
    WITH all_b_features AS (
        SELECT *, 'source1' AS source FROM features_b1
        UNION ALL
        SELECT *, 'source2' AS source FROM features_b2
        UNION ALL
        SELECT *, 'source3' AS source FROM features_b3
    )
    SELECT a.*, b.name AS overlap_name, b.source
    FROM features_a a
    INNER JOIN all_b_features b ON a.interval INTERSECTS b.interval
""")

Use case: Track which annotation database each overlap originated from.

Complex Filtering

Overlap with Quality Filters

Combine spatial and attribute filters:

cursor = engine.execute("""
    SELECT v.*, g.name AS gene_name
    FROM variants v
    INNER JOIN genes g ON v.interval INTERSECTS g.interval
    WHERE v.quality >= 30
      AND g.biotype = 'protein_coding'
    ORDER BY v.chromosome, v.start_pos
""")

Use case: Find high-quality variants in protein-coding genes.

Specific Target Genes

Find overlaps with a specific set of genes:

cursor = engine.execute("""
    SELECT v.*, g.name AS gene_name
    FROM variants v
    INNER JOIN genes g ON v.interval INTERSECTS g.interval
    WHERE g.name IN ('BRCA1', 'BRCA2', 'TP53', 'EGFR')
    ORDER BY g.name, v.start_pos
""")

Use case: Extract variants in clinically relevant genes.