8 Cell cycle Assignment

In some datasets, the phase of cell cycle that a cell is in (G1/G2M/S) can account for alot of the observed transcriptomic variation. There may be clustering by phase, or separation in the UMAP by phase.

Seurat provides a simple method for assigning cell cycle state to each cell. Other methods are available.

More information about assigning cell cycle states to cells is in the cell cycle vignette

# A list of cell cycle markers, from Tirosh et al, 2015, is loaded with Seurat.  We can
# segregate this list into markers of G2/M phase and markers of S phase
s.genes   <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

# Use those lists with the cell cycle scoring function in Seurat.
pbmc <- CellCycleScoring(pbmc, s.features = s.genes, g2m.features = g2m.genes)
#> Warning: The following features are not present in the
#> object: DTL, UHRF1, MLF1IP, EXO1, CASP8AP2, BRIP1, E2F8,
#> not searching for symbol synonyms
#> Warning: The following features are not present in the
#> object: FAM64A, BUB1, HJURP, CDCA3, TTK, CDC25C, DLGAP5,
#> CDCA2, ANLN, GAS2L3, not searching for symbol synonyms

Which adds S.Score, G2M.Score and Phase calls to the metadata.

head(pbmc@meta.data)
#>                  orig.ident nCount_RNA nFeature_RNA
#> AAACATACAACCAC-1     pbmc3k       2419          779
#> AAACATTGAGCTAC-1     pbmc3k       4903         1352
#> AAACATTGATCAGC-1     pbmc3k       3147         1129
#> AAACCGTGCTTCCG-1     pbmc3k       2639          960
#> AAACCGTGTATGCG-1     pbmc3k        980          521
#> AAACGCACTGGTAC-1     pbmc3k       2163          781
#>                  percent.mt RNA_snn_res.0.5 seurat_clusters
#> AAACATACAACCAC-1  3.0177759               2               6
#> AAACATTGAGCTAC-1  3.7935958               3               1
#> AAACATTGATCAGC-1  0.8897363               2               0
#> AAACCGTGCTTCCG-1  1.7430845               1               5
#> AAACCGTGTATGCG-1  1.2244898               6               8
#> AAACGCACTGGTAC-1  1.6643551               2               0
#>                  RNA_snn_res.0.1 RNA_snn_res.2 Naive_CD4_T1
#> AAACATACAACCAC-1               0             6   1.22824523
#> AAACATTGAGCTAC-1               3             1  -0.08111043
#> AAACATTGATCAGC-1               0             0  -0.37682601
#> AAACCGTGCTTCCG-1               1             5  -0.72739714
#> AAACCGTGTATGCG-1               2             8  -1.17396454
#> AAACGCACTGGTAC-1               0             0  -0.63807586
#>                   cell_label SingleR.labels     S.Score
#> AAACATACAACCAC-1 Naive_CD4_T        T_cells  0.09853841
#> AAACATTGAGCTAC-1        <NA>         B_cell -0.02364305
#> AAACATTGATCAGC-1        <NA>        T_cells -0.02177266
#> AAACCGTGCTTCCG-1        <NA>       Monocyte  0.03794398
#> AAACCGTGTATGCG-1        <NA>        NK_cell -0.03309970
#> AAACGCACTGGTAC-1        <NA>        T_cells -0.04814181
#>                     G2M.Score Phase
#> AAACATACAACCAC-1 -0.044716507     S
#> AAACATTGAGCTAC-1 -0.046889929    G1
#> AAACATTGATCAGC-1  0.074841537   G2M
#> AAACCGTGCTTCCG-1  0.006575446     S
#> AAACCGTGTATGCG-1  0.027910063   G2M
#> AAACGCACTGGTAC-1 -0.078164839    G1

We can then check the cell phase on the UMAP. In this dataset, phase isn’t driving the clustering, and would not require any further handling.

DimPlot(pbmc, reduction = 'umap', group.by = "Phase")

Where a bias is present, your course of action depends on the task at hand. It might involve ‘regressing out’ the cell cycle variation when scaling data ScaleData(kang, vars.to.regress="Phase"), omitting cell-cycle dominated clusters, or just accounting for it in your differential expression calculations.

If you are working with non-human data, you will need to convert these gene lists, or find new cell cycle associated genes in your species.