---
# SPDX-FileCopyrightText: 2023 Laurent Modolo
# 
# SPDX-License-Identifier: CC-BY-SA-4.0
title: "Calibrated Chip-Seq with `bamcalib`"
author: "laurent.modolo@ens-lyon.fr"
format: 
    revealjs:
        transition: none
        format:
        highlight-style: monokai
        theme: white
---

## Calibration data

::: columns
::: {.column width="50%"}
![](./img/IP_1.png)
:::

::: {.column width="50%"}
![](./img/IP_2.png)
:::
:::

Are the two reference coverage different ?

. . .

The coverage differences can be due to:

-   Differences in experiment efficiency

-   Differences in amount of biological material

## Experiment efficiency

![](./img/IP_1.png){width="50%" height="50%"}

Scaling the data by the calibration coverage:

-   Differences in experiment efficiency

. . .

How do we account for differences in the amount of biological material ?

------------------------------------------------------------------------

## Amount of biological material

::: columns
::: {.column width="50%"}
![](./img/IP_1.png)
:::

::: {.column width="50%"}
![](./img/WCE.png)
:::
:::

Scaling the data by the calibration coverage:

-   Differences in experiment efficiency

Scaling the data by the scaled WCE coverage:

-   Differences in amount of biological material

## How do we scale ?

::: columns
::: {.column width="50%"}
![](./img/coverage_total.png)
:::

::: {.column width="50%"}
We scale by the average coverage $\frac{1}{T}\sum_{t=1}^{T}IP(t)$
:::
:::

## Normalization

With

-   $IP_{x}\left(t\right)$ the coverage at position $t$ in the IP sample on the reference genome
-   $IP_{c}\left(t\right)$ the coverage at position $t$ in the IP sample on the calibration genome
-   $WCE_{x}\left(t\right)$ the coverage at position $t$ in the WCE sample on the reference genome
-   $WCE_{c}\left(t\right)$ the coverage at position $t$ in the WCE sample on the calibration genome

---

## Occupancy Ratio (OR)

$$
\text{norm}IP\left(t\right) = IP_{x}\left(t\right) \frac{ \frac{1}{T_c}\sum_{t^{'}=1}^{T_c}WCE_{c}\left(t^{'}\right)}{\frac{1}{T_c}\sum_{t^{'}=1}^{T_c}IP_{c}\left(t^{'}\right)\frac{1}{T_x}\sum_{t^{'}=1}^{T_x}WCE_{x}\left(t^{'}\right)}
$$
[Hu et al.](https://doi.org/10.1093/nar/gkv670)

Can we normalize the $normIP(t)$ data by the WCE data ?

---

## Normalization

In classical Chip-Seq experiement, we cannot normalize the IP data with the WCE data:

-   Differences in experiment efficiency
-   Differences in amount of biological material

Can we compute the $\text{norm}IP(t) / \text{norm}WCE(t)$ ratio ?

::: columns
::: {.column width="50%"}
![](./img/normIP.png)
:::

::: {.column width="50%"}
![](./img/normWCE.png)
:::
:::

------------------------------------------------------------------------

## Scaled WCE

We can remove the :

-   Differences in experiment efficiency

$$
\text{norm}WCE\left(t\right) = WCE_{x}\left(t\right) \frac{1}{\frac{1}{T_c}\sum_{t^{'}=1}^{T_c}WCE_{c}\left(t^{'}\right)}  \alpha
$$

How do we compute $\alpha$ to remove the :

-   Differences in amount of biological material

. . .

We can't we need a WCE of the WCE

## Proposition

We can find $\alpha$ such that:

$$
E\left(\text{norm}IP\left(t\right)\right)= E\left(\frac{\text{norm}IP\left(t\right)}{\text{norm}WCE\left(t\right)}\right)
$$

::: columns
::: {.column width="50%"}
![](./img/normIP.png)
:::

::: {.column width="50%"}
![](./img/normWCE.png)
:::
:::

## Coverage computation

### Pileup:
number of reads overlapping a given position

### Problem:

::: columns
::: {.column width="50%"}
![](./img/pileup_paired.png)
We want 4,4 instead of 4,0
:::

::: {.column width="50%"}
![](./img/pileup_overlapping.png)
We want 2 instead of 4
:::
:::

---
## The `bamcalib` way

We keep track of the pairs informations to make a pileup at the fragment level

### Problem

With repeated regions the read 1 can map at the start of the chromosome and the read 2 at the end


![](./img/fragment_length.png){width="50%" height="50%"}

## The `bamcalib` way

Compute the following robust mean

![](./img/fragment_mean_length.png){width="40%" height="40%"}

Filter-out huge fragments

![](./img/fragment_filter.png){width="40%" height="40%"}

## The `bamcalib` way

With cargo:

```sh
cargo install bamcalib --version 0.1.1
```

With Docker:
```sh
docker run -it lbmc/bamcalib:0.1.1 bamcalib # 6.08 MB
```

Run it 

```sh
bamcalib \
  -i IP_11_sorted.bam \
  -w T_11_sorted.bam \
  -bigwig-ip IP_11.bigwig \
  -bigwig-WCE T_11.bigwig \
  -t 8
```

## The `bamcalib` way

```sh
docker run -it lbmc/bamcalib:0.1.1 bamcalib -h                                                                                      1 ✘  7m 42s   laurent@LBMC-5820-LM  15:23:37 
Compute calibrated density from calibration mapping data

Usage: bamcalib [OPTIONS] --ip-bam <IP_BAM> --wce-bam <WCE_BAM> --bigwig-ip <BIGWIG_IP> --bigwig-wce <BIGWIG_WCE>

Options:
  -i, --ip-bam <IP_BAM>          sorted bam file for the IP data
  -w, --wce-bam <WCE_BAM>        sorted bam file for the WCE data
  -b, --bigwig-ip <BIGWIG_IP>    normalized bigwig file name for the output
  -b, --bigwig-wce <BIGWIG_WCE>  normalized bigwig file name for the output
  -c, --cal-prefix <CAL_PREFIX>  calibration prefix added to the chromosome name of the calibration genome [default: calib_]
  -s, --scaling <SCALING>        multiply the OR by this value (must be the same across sample) [default: 1000]
  -n, --no-bits <NO_BITS>        remove reads which bit code match this code [default: 1540]
  -t, --thread <THREAD>          number of threads to parse bam files [default: 1]
  -h, --help                     Print help
  -V, --version                  Print version
```