Intersection Queries
This section covers common patterns for finding overlapping genomic features using GIQL’s spatial operators.
Finding Overlapping Features
Basic Overlap Query
Find all features in table A that overlap with any feature in table B:
cursor = engine.execute("""
SELECT DISTINCT a.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
""")
Use case: Identify variants that fall within gene regions.
Get All Overlap Pairs
Return every pair of overlapping features (may produce duplicates if one feature overlaps multiple others):
cursor = engine.execute("""
SELECT a.*, b.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
""")
Use case: Generate a full overlap matrix for downstream analysis.
Query Against a Specific Region
Find features overlapping a literal genomic range:
cursor = engine.execute("""
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000000-2000000'
""")
Use case: Extract all data for a specific chromosomal region.
Filtering by Overlap
Excluding Overlaps
Find features in A that do NOT overlap with any feature in B:
cursor = engine.execute("""
SELECT a.*
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
WHERE b.chromosome IS NULL
""")
Use case: Find regulatory regions that don’t overlap with known genes, or identify variants outside of exonic regions.
Features with Any Overlap (Unique)
Return each feature from A only once, regardless of how many B features it overlaps:
cursor = engine.execute("""
SELECT DISTINCT a.*
FROM features_a a
INNER JOIN features_b b ON a.interval INTERSECTS b.interval
""")
Use case: Get a deduplicated list of features that have at least one overlap.
Counting Overlaps
Count Overlapping Features
Count how many B features each A feature overlaps:
cursor = engine.execute("""
SELECT a.*, COUNT(b.name) AS overlap_count
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
GROUP BY a.chromosome, a.start_pos, a.end_pos, a.name, a.score, a.strand
""")
Use case: Calculate how many enhancers each gene overlaps with, or count variants per feature.
Filter by Overlap Count
Find features that overlap at least N other features:
cursor = engine.execute("""
SELECT a.*
FROM features_a a
INNER JOIN features_b b ON a.interval INTERSECTS b.interval
GROUP BY a.chromosome, a.start_pos, a.end_pos, a.name, a.score, a.strand
HAVING COUNT(*) >= 3
""")
Use case: Identify hotspot regions with high feature density.
Strand-Specific Operations
Same-Strand Overlaps
Find overlapping features on the same strand:
cursor = engine.execute("""
SELECT a.*, b.name AS b_name
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND a.strand = b.strand
""")
Use case: Find sense-strand overlaps for transcript analysis.
Opposite-Strand Overlaps
Find overlapping features on opposite strands:
cursor = engine.execute("""
SELECT a.*, b.name AS b_name
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND a.strand != b.strand
AND a.strand IN ('+', '-')
AND b.strand IN ('+', '-')
""")
Use case: Identify antisense overlaps or convergent transcription.
Overlap Fraction Requirements
Minimum Overlap Fraction of A
Find overlaps where at least 50% of feature A is covered:
cursor = engine.execute("""
SELECT a.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND (
LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)
) >= 0.5 * (a.end_pos - a.start_pos)
""")
Use case: Ensure substantial overlap rather than just touching edges.
Minimum Overlap Fraction of B
Find overlaps where at least 50% of feature B is covered:
cursor = engine.execute("""
SELECT a.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND (
LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)
) >= 0.5 * (b.end_pos - b.start_pos)
""")
Use case: Find features that substantially cover smaller annotations.
Reciprocal Overlap
Require both features to have at least 50% mutual overlap:
cursor = engine.execute("""
WITH overlap_calcs AS (
SELECT
a.*,
b.name AS b_name,
(LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)) AS overlap_bp,
(a.end_pos - a.start_pos) AS a_length,
(b.end_pos - b.start_pos) AS b_length
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
)
SELECT *
FROM overlap_calcs
WHERE overlap_bp >= 0.5 * a_length
AND overlap_bp >= 0.5 * b_length
""")
Use case: Find high-confidence overlaps where features mutually cover each other.
Join Patterns
Left Outer Join
Report all features from A, with B information where available:
cursor = engine.execute("""
SELECT a.*, b.name AS overlapping_feature
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
""")
Use case: Annotate features with overlap information while keeping all records.
Calculate Overlap Amount
Return the overlap size in base pairs:
cursor = engine.execute("""
SELECT
a.*,
b.name AS b_name,
(LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)) AS overlap_bp
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
""")
Use case: Quantify the extent of each overlap.
Overlap with NULL Handling
Report overlap amount for all A features, with 0 for non-overlapping:
cursor = engine.execute("""
SELECT
a.*,
b.name AS b_name,
CASE
WHEN b.chromosome IS NULL THEN 0
ELSE LEAST(a.end_pos, b.end_pos) - GREATEST(a.start_pos, b.start_pos)
END AS overlap_bp
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
""")
Use case: Create a complete overlap report including non-overlapping features.
Multi-Table Operations
Union Multiple Sources
Intersect A with features from multiple B tables:
# Load and register multiple tables first
engine.load_csv("features_b1", "file1.bed")
engine.load_csv("features_b2", "file2.bed")
engine.load_csv("features_b3", "file3.bed")
# Register schemas for each...
cursor = engine.execute("""
WITH all_b_features AS (
SELECT * FROM features_b1
UNION ALL
SELECT * FROM features_b2
UNION ALL
SELECT * FROM features_b3
)
SELECT DISTINCT a.*
FROM features_a a
INNER JOIN all_b_features b ON a.interval INTERSECTS b.interval
""")
Use case: Find features overlapping any region from multiple annotation sources.
Track Overlap Source
Know which source table each overlap came from:
cursor = engine.execute("""
WITH all_b_features AS (
SELECT *, 'source1' AS source FROM features_b1
UNION ALL
SELECT *, 'source2' AS source FROM features_b2
UNION ALL
SELECT *, 'source3' AS source FROM features_b3
)
SELECT a.*, b.name AS overlap_name, b.source
FROM features_a a
INNER JOIN all_b_features b ON a.interval INTERSECTS b.interval
""")
Use case: Track which annotation database each overlap originated from.
Complex Filtering
Overlap with Quality Filters
Combine spatial and attribute filters:
cursor = engine.execute("""
SELECT v.*, g.name AS gene_name
FROM variants v
INNER JOIN genes g ON v.interval INTERSECTS g.interval
WHERE v.quality >= 30
AND g.biotype = 'protein_coding'
ORDER BY v.chromosome, v.start_pos
""")
Use case: Find high-quality variants in protein-coding genes.
Specific Target Genes
Find overlaps with a specific set of genes:
cursor = engine.execute("""
SELECT v.*, g.name AS gene_name
FROM variants v
INNER JOIN genes g ON v.interval INTERSECTS g.interval
WHERE g.name IN ('BRCA1', 'BRCA2', 'TP53', 'EGFR')
ORDER BY g.name, v.start_pos
""")
Use case: Extract variants in clinically relevant genes.