5 QC Filtering

Tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial#qc-and-selecting-cells-for-further-analysis

5.1 QC and selecting cells for further analysis

Why do we need to do this?

Low quality cells can add noise to your results leading you to the wrong biological conclusions. Using only good quality cells helps you to avoid this. Reduce noise in the data by filtering out low quality cells such as dying or stressed cells (high mitochondrial expression) and cells with few features that can reflect empty droplets.

Challenge: The meta.data slot in the Seurat object

Where are QC metrics stored in Seurat?

  • The number of unique genes and total molecules are automatically calculated during CreateSeuratObject()
    • You can find them stored in the object meta data
  1. What do you notice has changed within the meta.data table now that we have calculated mitochondrial gene proportion?

  2. Imagine that this is the first of several samples in our experiment. Add a samplename column to to the meta.data table.

Challenge: Filter the cells

Apply the filtering thresolds defined above.

  • How many cells survived filtering?

The PBMC3k dataset we’re working with in this tutorial is quite old. There are a number of other example datasets available from the 10X website, including this one - published in 2022, sequencing 10k PBMCs with a newer chemistry and counting method.

  • What thresholds would you chose to apply to this modern dataset?
pbmc10k_unfiltered <- readRDS("data/10k_PBMC_v3.1ChromiumX_Intronic.rds")
VlnPlot(pbmc10k_unfiltered, features = c("nFeature_RNA", "nCount_RNA"), ncol = 2)