Disjoining Intervals#
This section covers patterns for splitting intervals at breakpoints using
GIQL’s DISJOIN operator – partitioning a set into non-overlapping
segments and re-tiling features against a reference grid.
Partition a Set of Intervals#
Build a Disjoint Partition#
Split a set of intervals into the maximal set of non-overlapping sub-intervals defined by its own breakpoints:
SELECT DISTINCT disjoin_chrom, disjoin_start, disjoin_end
FROM DISJOIN(features)
ORDER BY disjoin_chrom, disjoin_start
Use case: Given ChIP-seq peak calls pooled from several samples, produce a
non-overlapping segment track – the equivalent of Bioconductor’s disjoin()
– so downstream signal or count aggregates never double-count a base covered
by two overlapping sample peaks.
Track Each Segment’s Parent#
Keep the parent feature alongside each sub-interval:
SELECT name, disjoin_start, disjoin_end
FROM DISJOIN(features)
ORDER BY name, disjoin_start
Use case: See how each original feature was fragmented and which segment came from which parent.
Split Against a Reference#
Split Features Against a Mask#
Cut target features at the boundaries of a reference (mask) set, keeping only the pieces the mask covers:
SELECT name, disjoin_start, disjoin_end
FROM DISJOIN(features, reference := mask)
Use case: Restrict ATAC-seq or gene features to a set of callable (or otherwise interesting) mask regions, splitting them at the mask boundaries so no reported piece straddles a mask edge.
Re-tile Against a Uniform Grid#
Pass a generated set of fixed-width bins as the reference:
WITH bins AS (
SELECT 'chr1' AS chrom, x AS start, x + 1000 AS "end"
FROM range(0, 250000000, 1000) AS t(x)
)
SELECT * FROM DISJOIN(features, reference := bins)
Use case: Break features onto a uniform coordinate grid – for example to build a fixed-width binned coverage matrix – so each piece falls within a single bin.
Note
range() is DuckDB-specific syntax for generating the bin grid; other
engines need their own generator. The grid must also span every chromosome
present in features, or features on an uncovered chromosome are dropped.
Coming from Bedtools?#
DISJOIN has no direct bedtools equivalent – no single bedtools command
splits intervals at breakpoints. See the Bedtools Migration Guide guide for
the GIQL operators that do map onto bedtools commands.