9 Introduction to Part 2

9.1 Part 1 Recap

In Part 1, we processed raw single-cell RNA-seq data through the following steps:

Step	What we did	Why
Load data	Imported raw count matrices for 8 PBMC samples	Starting point — raw gene expression counts per cell
Quality control	Filtered low-quality cells based on count depth and mitochondrial gene percentage	Remove damaged or empty droplets that would skew downstream analysis
Normalisation	Normalised counts and identified highly variable genes	Make cells comparable despite differences in sequencing depth
Dimensionality reduction	Computed PCA and UMAP embeddings	Reduce noise and enable visualisation of structure in high-dimensional data

9.2 Clear your session

Before starting Part 2, clear your environment and restart your R session to remove any saved objects from Part 1.

In RStudio: Session → Restart R (or Ctrl+Shift+F10 / Cmd+Shift+F10 on Mac)

9.3 Learning objectives

By the end of this lesson, you will be able to:

Recall the experimental design and analysis goal
Identify where this information is in the data

9.4 Overview

Analysis goal: Identify molecular differences between stimulated vs. stimulated peripheral blood mononuclear cells (PBMCs).

PBMCs are stimulated in vitro with IFN-beta. This is expected to trigger the up or downregulation of immune pathways. In our data, we can measure this directly by comparing the gene expression levels of immune-related genes.

We have the following information in our Seurat object metadata:

ind identifies a cell as coming from one of 8 individuals.
stim identifies a cell as control or stimulated with IFN-beta.
cell contains the cell types identified by the creators of this data set.
multiplets classifies cells as singlet or doublet.

Read in libraries

library(here)
library(qs2)
#> qs2 0.2.1
library(Seurat)
library(harmony)
#> Loading required package: Rcpp
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────── tidyverse 2.0.0 ──
#> ✔ forcats   1.0.1     ✔ stringr   1.6.0
#> ✔ lubridate 1.9.4     ✔ tibble    3.3.0
#> ✔ purrr     1.2.0     ✔ tidyr     1.3.1
#> ✔ readr     2.1.6
#> ── Conflicts ────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'matrixStats'
#> 
#> The following object is masked from 'package:dplyr':
#> 
#>     count
#> 
#> 
#> Attaching package: 'MatrixGenerics'
#> 
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet,
#>     colCollapse, colCounts, colCummaxs, colCummins,
#>     colCumprods, colCumsums, colDiffs, colIQRDiffs,
#>     colIQRs, colLogSumExps, colMadDiffs, colMads,
#>     colMaxs, colMeans2, colMedians, colMins,
#>     colOrderStats, colProds, colQuantiles, colRanges,
#>     colRanks, colSdDiffs, colSds, colSums2,
#>     colTabulates, colVarDiffs, colVars,
#>     colWeightedMads, colWeightedMeans,
#>     colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys,
#>     rowAvgsPerColSet, rowCollapse, rowCounts,
#>     rowCummaxs, rowCummins, rowCumprods, rowCumsums,
#>     rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2,
#>     rowMedians, rowMins, rowOrderStats, rowProds,
#>     rowQuantiles, rowRanges, rowRanks, rowSdDiffs,
#>     rowSds, rowSums2, rowTabulates, rowVarDiffs,
#>     rowVars, rowWeightedMads, rowWeightedMeans,
#>     rowWeightedMedians, rowWeightedSds,
#>     rowWeightedVars
#> 
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> 
#> The following object is masked from 'package:lubridate':
#> 
#>     as.difftime
#> 
#> The following object is masked from 'package:dplyr':
#> 
#>     explain
#> 
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect,
#>     is.element, setdiff, setequal, union
#> 
#> 
#> Attaching package: 'BiocGenerics'
#> 
#> The following object is masked from 'package:dplyr':
#> 
#>     combine
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> 
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame,
#>     basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, Filter, Find, get, grep,
#>     grepl, is.unsorted, lapply, Map, mapply, match,
#>     mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, Position, rank, rbind, Reduce,
#>     rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> 
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> 
#> The following objects are masked from 'package:lubridate':
#> 
#>     second, second<-
#> 
#> The following object is masked from 'package:tidyr':
#> 
#>     expand
#> 
#> The following objects are masked from 'package:dplyr':
#> 
#>     first, rename
#> 
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> 
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> 
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> 
#> The following object is masked from 'package:lubridate':
#> 
#>     %within%
#> 
#> The following object is masked from 'package:purrr':
#> 
#>     reduce
#> 
#> The following object is masked from 'package:sp':
#> 
#>     %over%
#> 
#> The following objects are masked from 'package:dplyr':
#> 
#>     collapse, desc, slice
#> 
#> Loading required package: Seqinfo
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view
#>     with 'browseVignettes()'. To cite Bioconductor,
#>     see 'citation("Biobase")', and for packages
#>     'citation("pkgname")'.
#> 
#> 
#> Attaching package: 'Biobase'
#> 
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> 
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
#> Warning: replacing previous import
#> 'S4Arrays::makeNindexFromArrayViewport' by
#> 'DelayedArray::makeNindexFromArrayViewport' when loading
#> 'SummarizedExperiment'
#> 
#> Attaching package: 'SummarizedExperiment'
#> 
#> The following object is masked from 'package:Seurat':
#> 
#>     Assays
#> 
#> The following object is masked from 'package:SeuratObject':
#> 
#>     Assays
library(SingleR)
library(celldex)
#> Warning: replacing previous import
#> 'S4Arrays::makeNindexFromArrayViewport' by
#> 'DelayedArray::makeNindexFromArrayViewport' when loading
#> 'HDF5Array'
#> 
#> Attaching package: 'celldex'
#> 
#> The following objects are masked from 'package:SingleR':
#> 
#>     BlueprintEncodeData,
#>     DatabaseImmuneCellExpressionData,
#>     HumanPrimaryCellAtlasData, ImmGenData,
#>     MonacoImmuneData, MouseRNAseqData,
#>     NovershternHematopoieticData
library(patchwork)

Read in the preprocessed Seurat object from part 1. We use qs2 to efficiently load the data in.

seurat_object <- qs2::qs_read(here("data", "seurat_object_preprocessed.qs2"))

Exercise

Use names(seurat_object@meta.data) to identify where this information is stored in the Seurat object.

# Add your code here

Exercise

Use the table(seurat_object$<name>) function to see the number of cells that belong to each metadata category. Replace <name> with the relevant metadata names (e.g. ind, stim)

# Add your code here

It is useful to display the number of control and stimulated cells per individual

table(seurat_object$ind, seurat_object$stim)
#>       
#>        ctrl stim
#>   101   175  226
#>   107   115  106
#>   1015  503  489
#>   1016  335  348
#>   1039   87  100
#>   1244  373  311
#>   1256  384  385
#>   1488  406  534

Takeaways:

Each cell in our Seurat object is annotated either as the control or stimualted cell
Each cell notes the individual the cells were derived from
This is important so we know which cells to compare in the data

8 Dimensionality reduction

10 Data set integration with Harmony