Distance and Proximity Queries
==============================

This section covers patterns for calculating genomic distances and finding
nearest features using GIQL's distance operators.

.. contents::
   :local:
   :depth: 2

Calculating Distances
---------------------

Distance Between Feature Pairs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Calculate the distance between features in two tables:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           a.name AS feature_a,
           b.name AS feature_b,
           DISTANCE(a.interval, b.interval) AS distance
       FROM features_a a
       CROSS JOIN features_b b
       WHERE a.chromosome = b.chromosome
       ORDER BY a.name, distance
   """)

**Use case:** Generate a distance matrix between regulatory elements and genes.

.. note::

   Always include ``WHERE a.chromosome = b.chromosome`` to avoid comparing
   features on different chromosomes (which returns NULL for distance).

Identify Overlapping vs Proximal
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Classify relationships based on distance:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           p.name AS peak,
           g.name AS gene,
           DISTANCE(p.interval, g.interval) AS dist,
           CASE
               WHEN DISTANCE(p.interval, g.interval) = 0 THEN 'overlapping'
               WHEN DISTANCE(p.interval, g.interval) <= 1000 THEN 'proximal (<1kb)'
               WHEN DISTANCE(p.interval, g.interval) <= 10000 THEN 'nearby (<10kb)'
               ELSE 'distant'
           END AS relationship
       FROM peaks p
       CROSS JOIN genes g
       WHERE p.chromosome = g.chromosome
   """)

**Use case:** Categorize peak-gene relationships for enhancer analysis.

Filter by Maximum Distance
~~~~~~~~~~~~~~~~~~~~~~~~~~

Find feature pairs within a distance threshold:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           a.name,
           b.name,
           DISTANCE(a.interval, b.interval) AS dist
       FROM features_a a
       CROSS JOIN features_b b
       WHERE a.chromosome = b.chromosome
         AND DISTANCE(a.interval, b.interval) <= 50000
       ORDER BY dist
   """)

**Use case:** Find regulatory elements within 50kb of genes.

K-Nearest Neighbor Queries
--------------------------

Find K Nearest Features
~~~~~~~~~~~~~~~~~~~~~~~

For each peak, find the 3 nearest genes:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           peaks.name AS peak,
           nearest.name AS gene,
           nearest.distance
       FROM peaks
       CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=3) AS nearest
       ORDER BY peaks.name, nearest.distance
   """)

**Use case:** Annotate ChIP-seq peaks with nearby genes.

Nearest Feature to a Specific Location
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Find the 5 nearest genes to a specific genomic coordinate:

.. code-block:: python

   cursor = engine.execute("""
       SELECT name, distance
       FROM NEAREST(genes, reference='chr1:1000000-1001000', k=5)
       ORDER BY distance
   """)

**Use case:** Explore the genomic neighborhood of a position of interest.

Nearest with Distance Constraint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Find nearest features within a maximum distance:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           peaks.name AS peak,
           nearest.name AS gene,
           nearest.distance
       FROM peaks
       CROSS JOIN LATERAL NEAREST(
           genes,
           reference=peaks.interval,
           k=5,
           max_distance=100000
       ) AS nearest
       ORDER BY peaks.name, nearest.distance
   """)

**Use case:** Find regulatory targets within 100kb, ignoring distant genes.

Strand-Specific Queries
-----------------------

Same-Strand Nearest Neighbors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Find nearest features on the same strand only:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           peaks.name AS peak,
           nearest.name AS gene,
           nearest.strand,
           nearest.distance
       FROM peaks
       CROSS JOIN LATERAL NEAREST(
           genes,
           reference=peaks.interval,
           k=3,
           stranded=true
       ) AS nearest
       ORDER BY peaks.name, nearest.distance
   """)

**Use case:** Find same-strand genes for strand-specific regulatory analysis.

Directional Queries
-------------------

Upstream Features
~~~~~~~~~~~~~~~~~

Find features upstream (5') of reference positions using signed distances:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           peaks.name AS peak,
           nearest.name AS gene,
           nearest.distance
       FROM peaks
       CROSS JOIN LATERAL NEAREST(
           genes,
           reference=peaks.interval,
           k=10,
           signed=true
       ) AS nearest
       WHERE nearest.distance < 0
       ORDER BY peaks.name, nearest.distance DESC
   """)

**Use case:** Find genes upstream of regulatory elements.

.. note::

   With ``signed=true``, negative distances indicate upstream features
   and positive distances indicate downstream features.

Downstream Features
~~~~~~~~~~~~~~~~~~~

Find features downstream (3') of reference positions:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           peaks.name AS peak,
           nearest.name AS gene,
           nearest.distance
       FROM peaks
       CROSS JOIN LATERAL NEAREST(
           genes,
           reference=peaks.interval,
           k=10,
           signed=true
       ) AS nearest
       WHERE nearest.distance > 0
       ORDER BY peaks.name, nearest.distance
   """)

**Use case:** Identify downstream targets of promoter elements.

Promoter-Proximal Analysis
~~~~~~~~~~~~~~~~~~~~~~~~~~

Find features within a specific distance window around the reference:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           peaks.name AS peak,
           nearest.name AS gene,
           nearest.distance
       FROM peaks
       CROSS JOIN LATERAL NEAREST(
           genes,
           reference=peaks.interval,
           k=10,
           signed=true
       ) AS nearest
       WHERE nearest.distance BETWEEN -2000 AND 500
       ORDER BY peaks.name, ABS(nearest.distance)
   """)

**Use case:** Find genes with peaks in their promoter regions (-2kb to +500bp from TSS).

Combined Parameters
-------------------

Strand-Specific with Distance Constraint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Find nearby same-strand features:

.. code-block:: python

   cursor = engine.execute("""
       SELECT
           peaks.name AS peak,
           nearest.name AS gene,
           nearest.distance
       FROM peaks
       CROSS JOIN LATERAL NEAREST(
           genes,
           reference=peaks.interval,
           k=5,
           max_distance=50000,
           stranded=true,
           signed=true
       ) AS nearest
       WHERE nearest.distance BETWEEN -10000 AND 10000
       ORDER BY peaks.name, ABS(nearest.distance)
   """)

**Use case:** Find same-strand genes within ±10kb for promoter-enhancer analysis.

Distance Statistics
-------------------

Average Distance to Nearest Gene
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Calculate the average distance from peaks to their nearest gene:

.. code-block:: python

   cursor = engine.execute("""
       WITH nearest_genes AS (
           SELECT
               peaks.name AS peak,
               nearest.distance
           FROM peaks
           CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
       )
       SELECT
           COUNT(*) AS peak_count,
           AVG(distance) AS avg_distance,
           MIN(distance) AS min_distance,
           MAX(distance) AS max_distance
       FROM nearest_genes
   """)

**Use case:** Characterize the genomic distribution of peaks relative to genes.

Distance Distribution by Chromosome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Analyze distance patterns per chromosome:

.. code-block:: python

   cursor = engine.execute("""
       WITH nearest_genes AS (
           SELECT
               peaks.chromosome,
               peaks.name AS peak,
               nearest.distance
           FROM peaks
           CROSS JOIN LATERAL NEAREST(genes, reference=peaks.interval, k=1) AS nearest
       )
       SELECT
           chromosome,
           COUNT(*) AS peak_count,
           AVG(distance) AS avg_distance
       FROM nearest_genes
       GROUP BY chromosome
       ORDER BY chromosome
   """)

**Use case:** Compare regulatory element distribution across chromosomes.

Window Expansion Patterns
-------------------------

Expand Search Window
~~~~~~~~~~~~~~~~~~~~

Find features within an expanded window around each feature:

.. code-block:: python

   cursor = engine.execute("""
       WITH expanded AS (
           SELECT
               name,
               chromosome,
               start_pos - 5000 AS search_start,
               end_pos + 5000 AS search_end
           FROM peaks
       )
       SELECT
           e.name AS peak,
           b.*
       FROM expanded e
       JOIN features_b b
           ON b.chromosome = e.chromosome
           AND b.start_pos < e.search_end
           AND b.end_pos > e.search_start
   """)

**Use case:** Find all features within 5kb flanking regions.

.. note::

   This pattern uses raw coordinate manipulation rather than the NEAREST
   operator, which is useful when you need custom window shapes.