This workflow will take count matrices coming from BD Rhapsody type of targeted single-cell transcriptome assay. This kind of assay contain around 10,000 cells and 500 panel genes. It will perform filtering, normalization, visualization of gene expression and dimensionality reduction (UMAP).
-
Build an apptainer image (requires root priviledges):
make build
-
Edit
config.yaml
to enter path to your data. Default path will bedata
folder in this directory.# config.yaml data_dir: "data"
-
Make sure your input data files are called like
[SAMPLE]_DBEC_MolsPerCell_correct_gene_names.csv
, where [SAMPLE] is your sample name. Thus, an example input data will look like:├── data │ ├── [SAMPLE1]_DBEC_MolsPerCell_correct_gene_names.csv │ ├── [SAMPLE2]_DBEC_MolsPerCell_correct_gene_names.csv
-
(Optional) check that paths are alright by performing a snakemake dry run:
snakemake -n
-
Run the workflow:
make pipeline
We start with the count matrix which is produced by the BD internal mapping pipeline. This pipeline takes care of alignment to reference, quality filtering and error correction. It has cell ID in rows and gene ID in columns:
Cell_Index | Gene1 | Gene2 | Gene3 | Guide1 | Guide2 | Guide3 |
---|---|---|---|---|---|---|
7836734 | 0 | 68 | 0 | 0 | 0 | 0 |
4277806 | 0 | 25 | 0 | 0 | 10 | 0 |
In our specific case, there is also a "guide RNA gene" included in the panel, which allows us to identify the guide(s) expressed in each cell.
All the steps are currently performed in R using Seurat package.
- First, the count matrix is separated in two: for gene and guide expression.
- We filter both matrices to contain the same cells, as well as require the cell to have at least 10 guide reads and 1000 gene reads to pass the filter.
- We assign cell identity based on the highest expressed guide RNA for each cell.
- We plot expression levels of guide targets to visually assess guide effectiveness.
- We perform differential gene expression analysis: comparing cells that carry guides against your target gene of interest to cells carrying negative control (non-targeting) guides.
- We perform dimensionality reduction and visualize on UMAP if cells with a specific gene targeted form a separate cluster.
Folder results
will be created, with the following structure:
├── results
│ ├── [SAMPLE]_differential_expression.csv
│ ├── [SAMPLE]_genes.h5Seurat
│ ├── [SAMPLE]_guides.h5Seurat
│ ├── plots
│ │ ├── [SAMPLE]_expression_plot.png
│ │ ├── [SAMPLE]_umap.png
-
differential_expression.csv
: genes which are differentially expressed in cells carrying guides to a specific gene compared to cell carrying non-targeting guides. Included are genes with p-value cutoff 0.1. -
genes.h5Seurat
andguides.h5Seurat
: raw Seurat objects for gene and guide expression, respectively. -
Plots:
expression_plot.png
: a dot plot showing expression of each target gene in cells grouped by target gene.umap.png
: dimensionality reduction visualization of all cells, colored by their targeted gene.
Additionally, all the log files will be written to a logs
folder.
Start a jupyter-lab inside the container with R and python kernels available to interactively use Seurat or scanpy:
make run