Intersection#
This section covers common patterns for finding overlapping genomic features using GIQL’s spatial operators.
Finding Overlapping Features#
Basic Overlap Query#
Find all features in table A that overlap with any feature in table B:
SELECT DISTINCT a.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
Use case: Identify variants that fall within gene regions.
Get All Overlap Pairs#
Return every pair of overlapping features (may produce duplicates if one feature overlaps multiple others):
SELECT a.*, b.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
Use case: Generate a full overlap matrix for downstream analysis.
Query Against a Specific Region#
Find features overlapping a literal genomic range:
SELECT * FROM variants
WHERE interval INTERSECTS 'chr1:1000000-2000000'
Use case: Extract all data for a specific chromosomal region.
Filtering by Overlap#
Excluding Overlaps#
Find features in A that do NOT overlap with any feature in B:
SELECT a.*
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
WHERE b.chrom IS NULL
Use case: Find regulatory regions that don’t overlap with known genes, or identify variants outside of exonic regions.
Features with Any Overlap (Unique)#
Return each feature from A only once, regardless of how many B features it overlaps:
SELECT DISTINCT a.*
FROM features_a a
INNER JOIN features_b b ON a.interval INTERSECTS b.interval
Use case: Get a deduplicated list of features that have at least one overlap.
Counting Overlaps#
Count Overlapping Features#
Count how many B features each A feature overlaps:
SELECT a.*, COUNT(b.name) AS overlap_count
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
GROUP BY a.chrom, a.start, a.end, a.name, a.score, a.strand
Use case: Calculate how many enhancers each gene overlaps with, or count variants per feature.
Filter by Overlap Count#
Find features that overlap at least N other features:
SELECT a.*
FROM features_a a
INNER JOIN features_b b ON a.interval INTERSECTS b.interval
GROUP BY a.chrom, a.start, a.end, a.name, a.score, a.strand
HAVING COUNT(*) >= 3
Use case: Identify hotspot regions with high feature density.
Strand-Specific Operations#
Same-Strand Overlaps#
Find overlapping features on the same strand:
SELECT a.*, b.name AS b_name
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND a.strand = b.strand
Use case: Find sense-strand overlaps for transcript analysis.
Opposite-Strand Overlaps#
Find overlapping features on opposite strands:
SELECT a.*, b.name AS b_name
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND a.strand != b.strand
AND a.strand IN ('+', '-')
AND b.strand IN ('+', '-')
Use case: Identify antisense overlaps or convergent transcription.
Overlap Fraction Requirements#
Minimum Overlap Fraction of A#
Find overlaps where at least 50% of feature A is covered:
SELECT a.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND (
LEAST(a.end, b.end) - GREATEST(a.start, b.start)
) >= 0.5 * (a.end - a.start)
Use case: Ensure substantial overlap rather than just touching edges.
Minimum Overlap Fraction of B#
Find overlaps where at least 50% of feature B is covered:
SELECT a.*
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
AND (
LEAST(a.end, b.end) - GREATEST(a.start, b.start)
) >= 0.5 * (b.end - b.start)
Use case: Find features that substantially cover smaller annotations.
Reciprocal Overlap#
Require both features to have at least 50% mutual overlap:
WITH overlap_calcs AS (
SELECT
a.*,
b.name AS b_name,
(LEAST(a.end, b.end) - GREATEST(a.start, b.start)) AS overlap_bp,
(a.end - a.start) AS a_length,
(b.end - b.start) AS b_length
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
)
SELECT *
FROM overlap_calcs
WHERE overlap_bp >= 0.5 * a_length
AND overlap_bp >= 0.5 * b_length
Use case: Find high-confidence overlaps where features mutually cover each other.
Join Patterns#
Left Outer Join#
Report all features from A, with B information where available:
SELECT a.*, b.name AS overlapping_feature
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
Use case: Annotate features with overlap information while keeping all records.
Calculate Overlap Amount#
Return the overlap size in base pairs:
SELECT
a.*,
b.name AS b_name,
(LEAST(a.end, b.end) - GREATEST(a.start, b.start)) AS overlap_bp
FROM features_a a, features_b b
WHERE a.interval INTERSECTS b.interval
Use case: Quantify the extent of each overlap.
Overlap with NULL Handling#
Report overlap amount for all A features, with 0 for non-overlapping:
SELECT
a.*,
b.name AS b_name,
CASE
WHEN b.chrom IS NULL THEN 0
ELSE LEAST(a.end, b.end) - GREATEST(a.start, b.start)
END AS overlap_bp
FROM features_a a
LEFT JOIN features_b b ON a.interval INTERSECTS b.interval
Use case: Create a complete overlap report including non-overlapping features.
Multi-Table Operations#
Union Multiple Sources#
Intersect A with features from multiple B tables:
WITH all_b_features AS (
SELECT * FROM features_b1
UNION ALL
SELECT * FROM features_b2
UNION ALL
SELECT * FROM features_b3
)
SELECT DISTINCT a.*
FROM features_a a
INNER JOIN all_b_features b ON a.interval INTERSECTS b.interval
Use case: Find features overlapping any region from multiple annotation sources.
Track Overlap Source#
Know which source table each overlap came from:
WITH all_b_features AS (
SELECT *, 'source1' AS source FROM features_b1
UNION ALL
SELECT *, 'source2' AS source FROM features_b2
UNION ALL
SELECT *, 'source3' AS source FROM features_b3
)
SELECT a.*, b.name AS overlap_name, b.source
FROM features_a a
INNER JOIN all_b_features b ON a.interval INTERSECTS b.interval
Use case: Track which annotation database each overlap originated from.
Complex Filtering#
Overlap with Quality Filters#
Combine spatial and attribute filters:
SELECT v.*, g.name AS gene_name
FROM variants v
INNER JOIN genes g ON v.interval INTERSECTS g.interval
WHERE v.quality >= 30
AND g.biotype = 'protein_coding'
ORDER BY v.chrom, v.start
Use case: Find high-quality variants in protein-coding genes.
Specific Target Genes#
Find overlaps with a specific set of genes:
SELECT v.*, g.name AS gene_name
FROM variants v
INNER JOIN genes g ON v.interval INTERSECTS g.interval
WHERE g.name IN ('BRCA1', 'BRCA2', 'TP53', 'EGFR')
ORDER BY g.name, v.start
Use case: Extract variants in clinically relevant genes.