7 PCAs and UMAPs
7.1 Identification of highly variable features (feature selection)
7.2 Scaling the data
Tutorial: https://satijalab.org/seurat/articles/pbmc3k_tutorial#scaling-the-data
Why do we need to do this?
Highly expresed genes can overpower the signal of other less expresed genes with equal importance. Within the same cell the assumption is that the underlying RNA content is constant. Aditionally, If variables are provided in vars.to.regress, they are individually regressed against each feature, and the resulting residuals are then scaled and centered. This step allows controling for cell cycle and other factors that may bias your clustering.