8 Cell cycle Assignment
In some datasets, the phase of cell cycle that a cell is in (G1/G2M/S) can account for alot of the observed transcriptomic variation. There may be clustering by phase, or separation in the UMAP by phase.
Seurat provides a simple method for assigning cell cycle state to each cell. Other methods are available.
More information about assigning cell cycle states to cells is in the cell cycle vignette
# A list of cell cycle markers, from Tirosh et al, 2015, is loaded with Seurat. We can
# segregate this list into markers of G2/M phase and markers of S phase
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
# Use those lists with the cell cycle scoring function in Seurat.
pbmc <- CellCycleScoring(pbmc, s.features = s.genes, g2m.features = g2m.genes)
#> Warning: The following features are not present in the
#> object: DTL, UHRF1, MLF1IP, EXO1, CASP8AP2, BRIP1, E2F8,
#> not searching for symbol synonyms
#> Warning: The following features are not present in the
#> object: FAM64A, BUB1, HJURP, CDCA3, TTK, CDC25C, DLGAP5,
#> CDCA2, ANLN, GAS2L3, not searching for symbol synonyms
Which adds S.Score, G2M.Score and Phase calls to the metadata.
head(pbmc@meta.data)
#> orig.ident nCount_RNA nFeature_RNA
#> AAACATACAACCAC-1 pbmc3k 2419 779
#> AAACATTGAGCTAC-1 pbmc3k 4903 1352
#> AAACATTGATCAGC-1 pbmc3k 3147 1129
#> AAACCGTGCTTCCG-1 pbmc3k 2639 960
#> AAACCGTGTATGCG-1 pbmc3k 980 521
#> AAACGCACTGGTAC-1 pbmc3k 2163 781
#> percent.mt RNA_snn_res.0.5 seurat_clusters
#> AAACATACAACCAC-1 3.0177759 2 6
#> AAACATTGAGCTAC-1 3.7935958 3 1
#> AAACATTGATCAGC-1 0.8897363 2 0
#> AAACCGTGCTTCCG-1 1.7430845 1 5
#> AAACCGTGTATGCG-1 1.2244898 6 8
#> AAACGCACTGGTAC-1 1.6643551 2 0
#> RNA_snn_res.0.1 RNA_snn_res.0.2
#> AAACATACAACCAC-1 0 0
#> AAACATTGAGCTAC-1 3 3
#> AAACATTGATCAGC-1 0 0
#> AAACCGTGCTTCCG-1 1 1
#> AAACCGTGTATGCG-1 2 2
#> AAACGCACTGGTAC-1 0 0
#> RNA_snn_res.0.3 RNA_snn_res.0.4
#> AAACATACAACCAC-1 0 2
#> AAACATTGAGCTAC-1 3 3
#> AAACATTGATCAGC-1 0 2
#> AAACCGTGCTTCCG-1 1 1
#> AAACCGTGTATGCG-1 2 6
#> AAACGCACTGGTAC-1 0 2
#> RNA_snn_res.0.6 RNA_snn_res.0.7
#> AAACATACAACCAC-1 1 1
#> AAACATTGAGCTAC-1 3 3
#> AAACATTGATCAGC-1 1 1
#> AAACCGTGCTTCCG-1 2 2
#> AAACCGTGTATGCG-1 6 6
#> AAACGCACTGGTAC-1 1 1
#> RNA_snn_res.0.8 RNA_snn_res.0.9
#> AAACATACAACCAC-1 6 1
#> AAACATTGAGCTAC-1 2 2
#> AAACATTGATCAGC-1 1 1
#> AAACCGTGCTTCCG-1 4 4
#> AAACCGTGTATGCG-1 8 7
#> AAACGCACTGGTAC-1 1 1
#> RNA_snn_res.1 RNA_snn_res.1.1
#> AAACATACAACCAC-1 6 6
#> AAACATTGAGCTAC-1 2 2
#> AAACATTGATCAGC-1 1 1
#> AAACCGTGCTTCCG-1 4 4
#> AAACCGTGTATGCG-1 8 8
#> AAACGCACTGGTAC-1 1 1
#> RNA_snn_res.1.2 RNA_snn_res.1.3
#> AAACATACAACCAC-1 6 8
#> AAACATTGAGCTAC-1 2 2
#> AAACATTGATCAGC-1 1 0
#> AAACCGTGCTTCCG-1 4 5
#> AAACCGTGTATGCG-1 8 9
#> AAACGCACTGGTAC-1 1 0
#> RNA_snn_res.1.4 RNA_snn_res.1.5
#> AAACATACAACCAC-1 8 9
#> AAACATTGAGCTAC-1 2 2
#> AAACATTGATCAGC-1 0 1
#> AAACCGTGCTTCCG-1 5 4
#> AAACCGTGTATGCG-1 9 8
#> AAACGCACTGGTAC-1 0 1
#> RNA_snn_res.1.6 RNA_snn_res.1.7
#> AAACATACAACCAC-1 8 8
#> AAACATTGAGCTAC-1 1 1
#> AAACATTGATCAGC-1 0 0
#> AAACCGTGCTTCCG-1 3 3
#> AAACCGTGTATGCG-1 7 7
#> AAACGCACTGGTAC-1 0 0
#> RNA_snn_res.1.8 RNA_snn_res.1.9
#> AAACATACAACCAC-1 7 6
#> AAACATTGAGCTAC-1 1 1
#> AAACATTGATCAGC-1 0 0
#> AAACCGTGCTTCCG-1 3 3
#> AAACCGTGTATGCG-1 8 8
#> AAACGCACTGGTAC-1 0 0
#> RNA_snn_res.2 Naive_CD4_T1 cell_label
#> AAACATACAACCAC-1 6 1.22824523 Naive_CD4_T
#> AAACATTGAGCTAC-1 1 -0.08111043 <NA>
#> AAACATTGATCAGC-1 0 -0.37682601 <NA>
#> AAACCGTGCTTCCG-1 5 -0.72739714 <NA>
#> AAACCGTGTATGCG-1 8 -1.17396454 <NA>
#> AAACGCACTGGTAC-1 0 -0.63807586 <NA>
#> cluster celltype pretty_cluster_labels
#> AAACATACAACCAC-1 c2 Memory CD4+ c2: Memory CD4+
#> AAACATTGAGCTAC-1 c3 B c3: B
#> AAACATTGATCAGC-1 c2 Memory CD4+ c2: Memory CD4+
#> AAACCGTGCTTCCG-1 c1 CD14+ Mono c1: CD14+ Mono
#> AAACCGTGTATGCG-1 c6 NK c6: NK
#> AAACGCACTGGTAC-1 c2 Memory CD4+ c2: Memory CD4+
#> SingleR.labels S.Score G2M.Score
#> AAACATACAACCAC-1 T_cells 0.09853841 -0.044716507
#> AAACATTGAGCTAC-1 B_cell -0.02364305 -0.046889929
#> AAACATTGATCAGC-1 T_cells -0.02177266 0.074841537
#> AAACCGTGCTTCCG-1 Monocyte 0.03794398 0.006575446
#> AAACCGTGTATGCG-1 NK_cell -0.03309970 0.027910063
#> AAACGCACTGGTAC-1 T_cells -0.04814181 -0.078164839
#> Phase
#> AAACATACAACCAC-1 S
#> AAACATTGAGCTAC-1 G1
#> AAACATTGATCAGC-1 G2M
#> AAACCGTGCTTCCG-1 S
#> AAACCGTGTATGCG-1 G2M
#> AAACGCACTGGTAC-1 G1
We can then check the cell phase on the UMAP. In this dataset, phase isn’t driving the clustering, and would not require any further handling.
DimPlot(pbmc, reduction = 'umap', group.by = "Phase")
Where a bias is present, your course of action depends on the task at hand. It might involve ‘regressing out’ the cell cycle variation when scaling data ScaleData(kang, vars.to.regress="Phase")
, omitting cell-cycle dominated clusters, or just accounting for it in your differential expression calculations.
If you are working with non-human data, you will need to convert these gene lists, or find new cell cycle associated genes in your species.