--- # SPDX-FileCopyrightText: 2023 Laurent Modolo # # SPDX-License-Identifier: CC-BY-SA-4.0 title: "Calibrated Chip-Seq with `bamcalib`" author: "laurent.modolo@ens-lyon.fr" format: revealjs: transition: none format: highlight-style: monokai theme: white --- ## Calibration data ::: columns ::: {.column width="50%"} ![](./img/IP_1.png) ::: ::: {.column width="50%"} ![](./img/IP_2.png) ::: ::: Are the two reference coverage different ? . . . The coverage differences can be due to: - Differences in experiment efficiency - Differences in amount of biological material ## Experiment efficiency ![](./img/IP_1.png){width="50%" height="50%"} Scaling the data by the calibration coverage: - Differences in experiment efficiency . . . How do we account for differences in the amount of biological material ? ------------------------------------------------------------------------ ## Amount of biological material ::: columns ::: {.column width="50%"} ![](./img/IP_1.png) ::: ::: {.column width="50%"} ![](./img/WCE.png) ::: ::: Scaling the data by the calibration coverage: - Differences in experiment efficiency Scaling the data by the scaled WCE coverage: - Differences in amount of biological material ## How do we scale ? ::: columns ::: {.column width="50%"} ![](./img/coverage_total.png) ::: ::: {.column width="50%"} We scale by the average coverage $\frac{1}{T}\sum_{t=1}^{T}IP(t)$ ::: ::: ## Normalization With - $IP_{x}\left(t\right)$ the coverage at position $t$ in the IP sample on the reference genome - $IP_{c}\left(t\right)$ the coverage at position $t$ in the IP sample on the calibration genome - $WCE_{x}\left(t\right)$ the coverage at position $t$ in the WCE sample on the reference genome - $WCE_{c}\left(t\right)$ the coverage at position $t$ in the WCE sample on the calibration genome --- ## Occupancy Ratio (OR) $$ \text{norm}IP\left(t\right) = IP_{x}\left(t\right) \frac{ \frac{1}{T_c}\sum_{t^{'}=1}^{T_c}WCE_{c}\left(t^{'}\right)}{\frac{1}{T_c}\sum_{t^{'}=1}^{T_c}IP_{c}\left(t^{'}\right)\frac{1}{T_x}\sum_{t^{'}=1}^{T_x}WCE_{x}\left(t^{'}\right)} $$ [Hu et al.](https://doi.org/10.1093/nar/gkv670) Can we normalize the $normIP(t)$ data by the WCE data ? --- ## Normalization In classical Chip-Seq experiement, we cannot normalize the IP data with the WCE data: - Differences in experiment efficiency - Differences in amount of biological material Can we compute the $\text{norm}IP(t) / \text{norm}WCE(t)$ ratio ? ::: columns ::: {.column width="50%"} ![](./img/normIP.png) ::: ::: {.column width="50%"} ![](./img/normWCE.png) ::: ::: ------------------------------------------------------------------------ ## Scaled WCE We can remove the : - Differences in experiment efficiency $$ \text{norm}WCE\left(t\right) = WCE_{x}\left(t\right) \frac{1}{\frac{1}{T_c}\sum_{t^{'}=1}^{T_c}WCE_{c}\left(t^{'}\right)} \alpha $$ How do we compute $\alpha$ to remove the : - Differences in amount of biological material . . . We can't we need a WCE of the WCE ## Proposition We can find $\alpha$ such that: $$ E\left(\text{norm}IP\left(t\right)\right)= E\left(\frac{\text{norm}IP\left(t\right)}{\text{norm}WCE\left(t\right)}\right) $$ ::: columns ::: {.column width="50%"} ![](./img/normIP.png) ::: ::: {.column width="50%"} ![](./img/normWCE.png) ::: ::: ## Coverage computation ### Pileup: number of reads overlapping a given position ### Problem: ::: columns ::: {.column width="50%"} ![](./img/pileup_paired.png) We want 4,4 instead of 4,0 ::: ::: {.column width="50%"} ![](./img/pileup_overlapping.png) We want 2 instead of 4 ::: ::: --- ## The `bamcalib` way We keep track of the pairs informations to make a pileup at the fragment level ### Problem With repeated regions the read 1 can map at the start of the chromosome and the read 2 at the end ![](./img/fragment_length.png){width="50%" height="50%"} ## The `bamcalib` way Compute the following robust mean ![](./img/fragment_mean_length.png){width="40%" height="40%"} Filter-out huge fragments ![](./img/fragment_filter.png){width="40%" height="40%"} ## The `bamcalib` way With cargo: ```sh cargo install bamcalib --version 0.1.1 ``` With Docker: ```sh docker run -it lbmc/bamcalib:0.1.1 bamcalib # 6.08 MB ``` Run it ```sh bamcalib \ -i IP_11_sorted.bam \ -w T_11_sorted.bam \ -bigwig-ip IP_11.bigwig \ -bigwig-WCE T_11.bigwig \ -t 8 ``` ## The `bamcalib` way ```sh docker run -it lbmc/bamcalib:0.1.1 bamcalib -h  1 ✘  7m 42s   laurent@LBMC-5820-LM  15:23:37  Compute calibrated density from calibration mapping data Usage: bamcalib [OPTIONS] --ip-bam --wce-bam --bigwig-ip --bigwig-wce Options: -i, --ip-bam sorted bam file for the IP data -w, --wce-bam sorted bam file for the WCE data -b, --bigwig-ip normalized bigwig file name for the output -b, --bigwig-wce normalized bigwig file name for the output -c, --cal-prefix calibration prefix added to the chromosome name of the calibration genome [default: calib_] -s, --scaling multiply the OR by this value (must be the same across sample) [default: 1000] -n, --no-bits remove reads which bit code match this code [default: 1540] -t, --thread number of threads to parse bam files [default: 1] -h, --help Print help -V, --version Print version ```