8 Cell cycle Assignment
In some datasets, the phase of cell cycle that a cell is in (G1/G2M/S) can account for alot of the observed transcriptomic variation. There may be clustering by phase, or separation in the UMAP by phase.
Seurat provides a simple method for assigning cell cycle state to each cell. Other methods are available.
More information about assigning cell cycle states to cells is in the cell cycle vignette
# A list of cell cycle markers, from Tirosh et al, 2015, is loaded with Seurat. We can
# segregate this list into markers of G2/M phase and markers of S phase
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
# Use those lists with the cell cycle scoring function in Seurat.
pbmc <- CellCycleScoring(pbmc, s.features = s.genes, g2m.features = g2m.genes)
#> Warning: The following features are not present in the
#> object: DTL, UHRF1, MLF1IP, EXO1, CASP8AP2, BRIP1, E2F8,
#> not searching for symbol synonyms
#> Warning: The following features are not present in the
#> object: FAM64A, BUB1, HJURP, CDCA3, TTK, CDC25C, DLGAP5,
#> CDCA2, ANLN, GAS2L3, not searching for symbol synonyms
Which adds S.Score, G2M.Score and Phase calls to the metadata.
head(pbmc@meta.data)
#> orig.ident nCount_RNA nFeature_RNA
#> AAACATACAACCAC-1 pbmc3k 2419 779
#> AAACATTGAGCTAC-1 pbmc3k 4903 1352
#> AAACATTGATCAGC-1 pbmc3k 3147 1129
#> AAACCGTGCTTCCG-1 pbmc3k 2639 960
#> AAACCGTGTATGCG-1 pbmc3k 980 521
#> AAACGCACTGGTAC-1 pbmc3k 2163 781
#> percent.mt RNA_snn_res.0.5 seurat_clusters
#> AAACATACAACCAC-1 3.0177759 2 6
#> AAACATTGAGCTAC-1 3.7935958 3 1
#> AAACATTGATCAGC-1 0.8897363 2 0
#> AAACCGTGCTTCCG-1 1.7430845 1 5
#> AAACCGTGTATGCG-1 1.2244898 6 8
#> AAACGCACTGGTAC-1 1.6643551 2 0
#> RNA_snn_res.0.1 RNA_snn_res.2 Naive_CD4_T1
#> AAACATACAACCAC-1 0 6 1.22824523
#> AAACATTGAGCTAC-1 3 1 -0.08111043
#> AAACATTGATCAGC-1 0 0 -0.37682601
#> AAACCGTGCTTCCG-1 1 5 -0.72739714
#> AAACCGTGTATGCG-1 2 8 -1.17396454
#> AAACGCACTGGTAC-1 0 0 -0.63807586
#> cell_label SingleR.labels S.Score
#> AAACATACAACCAC-1 Naive_CD4_T T_cells 0.09853841
#> AAACATTGAGCTAC-1 <NA> B_cell -0.02364305
#> AAACATTGATCAGC-1 <NA> T_cells -0.02177266
#> AAACCGTGCTTCCG-1 <NA> Monocyte 0.03794398
#> AAACCGTGTATGCG-1 <NA> NK_cell -0.03309970
#> AAACGCACTGGTAC-1 <NA> T_cells -0.04814181
#> G2M.Score Phase
#> AAACATACAACCAC-1 -0.044716507 S
#> AAACATTGAGCTAC-1 -0.046889929 G1
#> AAACATTGATCAGC-1 0.074841537 G2M
#> AAACCGTGCTTCCG-1 0.006575446 S
#> AAACCGTGTATGCG-1 0.027910063 G2M
#> AAACGCACTGGTAC-1 -0.078164839 G1
We can then check the cell phase on the UMAP. In this dataset, phase isn’t driving the clustering, and would not require any further handling.
DimPlot(pbmc, reduction = 'umap', group.by = "Phase")
Where a bias is present, your course of action depends on the task at hand. It might involve ‘regressing out’ the cell cycle variation when scaling data ScaleData(kang, vars.to.regress="Phase")
, omitting cell-cycle dominated clusters, or just accounting for it in your differential expression calculations.
If you are working with non-human data, you will need to convert these gene lists, or find new cell cycle associated genes in your species.