---
title: "Quantifying protein with qp"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{quantifying-protein-with-qp}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

```{r setup}
library(qp)
```

The [Micro BCA](https://en.wikipedia.org/wiki/Bicinchoninic_acid_assay#Micro_BCA_assay_(for_dilute_solutions)) assay and its analysis is easy and straightforward. With most lab work, though, the analysis is typically repetitive. We can analyze these data programmatically to speed up and automate analysis, remove human error, and get reproducibility for free along the way.

`qp` intends to be no-fuss enough where it analyze routine runs and flexible enough to make analyzing hairier ones simple enough. In this vignette, I'll start with a straightforward data set and move to a more challenging one to show you the two different approaches to analyzing protein quantification data using `qp`.

# Routine analysis

The standard `qp` workflow is based off [this](https://kai.quarto.pub/bok/western-blot.html#protein-quantification) protocol. Provided the protocol is followed, relatively few parameters need to be changed from their default.

The easiest - and most common - way to read in your data is by providing a path to your SPECTRAmax file output. These `.txt` files are some of the most...creatively...formatted files I've come across.

```{r}
f <- system.file("extdata", "absorbances.txt", package = "qp")

readLines(f)
```

Fortunately, I've developed a little package `mop` that reads in lab data like these. This is used in the `qp` function when you provide it a character string. 

Typically when you perform these experiments, *you* will know how it was laid out. In these data I'm providing, though, I'll need to show you how it's laid out so that you can follow along why I set some parameters the way I do.

This is how the plate looks:

```{r echo=FALSE, message=FALSE, fig.width=10, fig.height=7, out.width="100%"}
qp_plot_plate(qp_tidy(f))
```

The samples are laid out in triplicate, horizontally arranged replicates in vertical bands that wrap to the next three columns once it hits the edge of the plate. 

```{r echo=FALSE, fig.width=10, fig.height=7, out.width="100%"}
library(gplate)
library(ggplot2)
gp(8, 12) |>
  gp_sec("band", nrow = 8, ncol = 3) |>
  gp_serve() |>
  ggplot(aes(.col, .row, color = band)) +
  geom_point(size = 15) +
  scale_y_reverse()
```

Furthermore, I have my seven standards in the top left corner of the plate, in ascending concentrations as they flow down the first column. The samples follow. The rest of the wells are empty:

```{r echo=FALSE, fig.width=10, fig.height=7, out.width="100%"}
gp(8, 12) |>
  gp_sec(
    "type", nrow = c(7, 5, 23), ncol = 3,
    labels = c("standard", "sample", "blank"),
    wrap = TRUE, flow = "col"
  ) |>
  gp_serve() |>
  ggplot(aes(.col, .row, color = type)) +
  geom_point(size = 15) +
  scale_y_reverse()
```

Since we followed the standard protocol, and since I told you replicates were arranged horizontally, analysis is a breeze:

```{r}
out <- qp(f, replicate_orientation = "h")
```

We can plot our standards and samples simply, using `qp_plot_standards`:

```{r fig.width=10, fig.height=7, out.width="100%"}
qp_plot_standards(out)
```

And we can get a summary of the concentrations using `qp_summarize`:

```{r}
summary <- qp_summarize(out)
summary
```

We can also calculate dilutions from this summary. By default, it will use the lowest concentration of the samples and a final volume of 15uL:

```{r}
summary |> qp_dilute()
```

# When things go wrong
Occasionally, your experiment won't go perfectly - maybe you mixed up the order of the standards, or could can't run samples with technical replicates. In this case, you can run the functions that `qp` runs individually, maybe doing a little data munging between steps.

Most functions depend on a previous function being run. See the following hierarchy to see what functions depend on what:

```{r echo=FALSE, fig.width=10, out.width="100%"}
library(DiagrammeR)
grViz("
digraph qp_deps {
  rankdir=BT
  graph [overlap = true, fontsize = 10]

  node [shape = box,
        fontname = Helvetica]
  tidy; calc_abs_mean; add_std_conc; add_names; plot_plate; fit;
  calc_conc; summarize; dilute; remove_empty; plot_standards

  calc_abs_mean->tidy
  add_std_conc->tidy
  add_names->tidy
  plot_plate->tidy
  fit->calc_abs_mean
  fit->add_std_conc
  calc_conc->fit
  summarize->calc_conc
  dilute->calc_conc
  remove_empty->calc_conc
  plot_standards->calc_conc
}
")
```


## Reading in the data
There are a couple ways of going about doing this. Depending on how borked your plate is, the most straightforward way may be to create a `data.frame` with columns `sample_type` (which should contain `standard` or `unknown`), `index` (which denotes the standard or sample number), and `.abs` (the absorbances at - hopefully but not necessarily - 562nm):

```{r}
standards <- data.frame(
  sample_type = "standard",
  index = rep(1:7, each = 2),
  abs = c(0.071, 0.079, 0.08, 0.082,
          0.1, 0.099, 0.15, 0.147,
          0.22, 0.215, 0.50, 0.48, 0.78, 0.79))
unknowns <- data.frame(sample_type = "unknown", index = 1, abs = 0.25)
data <- rbind(standards, unknowns)
data
```

Admittedly, this does destroy the whole 'reproducibility' aspect of it. Another way of going doing this is via [`gplate`](https://kaiaragaki.github.io/gplate/articles/gp_for_data.html)

```{r}
library(mop)
f <- system.file("extdata", "absorbances.txt", package = "qp")
spectramax <- read_spectramax(f)
spectramax
```

We can extract the `gp` from the `spectramax` object via:

```{r}
# I know it's ugly syntax - I'll work on it. Later.
gp <- spectramax$data[[1]]$data
gp
```
And then we can carve out rectangular sections using `gplate::gp_excise`:

```{r}
library(dplyr)
library(gplate)
standards <- gp |>
  gp_excise(top = 1, left = 1, bottom = 7, right = 2) |>
  gp_serve() |>
  mutate(
    sample_type = "standard",
    index = rep(1:7, each = 2)
  )

samples <- gp |>
  gp_excise(top = 1, left = 4, bottom = 1, right = 4) |>
  gp_serve() |>
  mutate(
    sample_type = "unknown",
    index = 1
  )
  
data <- rbind(standards, samples) |>
  rename(.abs = nm562)

data
```

## Adding standard concentrations
Suppose you accidentally made your highest standard twice as concentrated. That's a simple fix. Instead of using the standard standards:

```{r}
c(0, 2^((2:7) - 5))
```

You can set your own:

```{r}
c(0, 2^((2:6) - 5), 8)
```

and use them as an argument in `qp_add_std_conc`:

```{r}
qp_add_std_conc(data, c(0, 2^((2:6) - 5), 8))
```

You can specify them as strangely as you want:

```{r}
qp_add_std_conc(data, c(1, 7, 26, 0.4, 2, 1, 1))
```

## Calculating absorbance means
At some point - either before or after calculating your standard concentrations - you'll need to calculate the means of your absorbances. In an Excel based format, you might manually remove outliers - but what counts as an 'outlier' is somewhat arbitrary in that setting. You can optionally remove outliers in a systematic way with `qp`:

```{r}
some_abs <- absorbances[c(1:5, 21:26),]
some_abs |> 
  qp_calc_abs_mean(ignore_outliers = "none")
```

```{r}
some_abs |> 
  qp_calc_abs_mean(ignore_outliers = "standards")
```

```{r}
some_abs |> 
  qp_calc_abs_mean(ignore_outliers = "samples")
```

```{r}
some_abs |> 
  qp_calc_abs_mean(ignore_outliers = "all")
```

Something is an outlier if it is over 3 standard deviations away from the mean of the others. While this might mark things as outliers that you might not (perhaps because the other two samples were just ultra close to each other), it only has a downstream effect if these outliers are truly much different from the others.

## Fitting a model
  Once the above steps have been completed, you can fit a simple linear model:
  
```{r}
out <- absorbances |>
  qp_add_std_conc() |> # all values are defaults
  qp_calc_abs_mean() |>
  qp_fit()

out
```

## Predicting concentrations

```{r}
conc <- qp_calc_conc(out)
conc
```

## Removing empty wells

Samples that have a predicted concentration lower than 0 can be removed easily using `qp_remove_empty`:

```{r}
conc$qp |> tail()
```

```{r}
no_zero <- qp_remove_empty(conc)
no_zero$qp |> tail()
```

Note that this typically won't remove wells that have the BCA solution in it but have no sample. 

## Adding sample names

At any point, you can add sample names to the data - each index gets its own sample name.

```{r}
with_names <- no_zero |> qp_add_names(paste0("my_sample_", letters[1:8]))
with_names$qp |> tail()
```