3 The Seurat object

Most of todays workshop will be following the Seurat PBMC tutorial (reproduced in the next section). We’ll load raw counts data, do some QC and setup various useful information in a Seurat object.

But before that - what does a Seurat object look like, and what can we do with it once we’ve made one?

Lets have a look at a Seurat object that’s already setup.

3.1 Load an existing Seurat object

The data we’re working with today is a small dataset of about 3000 PBMCs (peripheral blood mononuclear cells) from a healthy donor. Just one sample.

This is an early demo dataset from 10X genomics (called pbmc3k) - you can find more information like qc reports here.

First, load Seurat package.

And here’s the one we prepared earlier. Seurat objects are usually saved as ‘.rds’ files, which is an R format for storing binary data (not-text or human-readable). The functions readRDS() can load it.

pbmc_processed <- readRDS("data/pbmc_tutorial.rds")
pbmc_processed
#> An object of class Seurat 
#> 13714 features across 2700 samples within 1 assay 
#> Active assay: RNA (13714 features, 2000 variable features)
#>  2 dimensional reductions calculated: pca, umap

3.2 What’s in there?

Some of the most important information for working with Seurat objects is in the metadata. This is cell level information - each row is one cell, identified by its barcode. Extra information gets added to this table as analysis progresses.

head(pbmc_processed@meta.data)
#>                  orig.ident nCount_RNA nFeature_RNA
#> AAACATACAACCAC-1     pbmc3k       2419          779
#> AAACATTGAGCTAC-1     pbmc3k       4903         1352
#> AAACATTGATCAGC-1     pbmc3k       3147         1129
#> AAACCGTGCTTCCG-1     pbmc3k       2639          960
#> AAACCGTGTATGCG-1     pbmc3k        980          521
#> AAACGCACTGGTAC-1     pbmc3k       2163          781
#>                  percent.mt RNA_snn_res.0.5 seurat_clusters
#> AAACATACAACCAC-1  3.0177759               0               0
#> AAACATTGAGCTAC-1  3.7935958               3               3
#> AAACATTGATCAGC-1  0.8897363               2               2
#> AAACCGTGCTTCCG-1  1.7430845               5               5
#> AAACCGTGTATGCG-1  1.2244898               6               6
#> AAACGCACTGGTAC-1  1.6643551               2               2

That doesn’t have any gene expression though, that’s stored in an ‘Assay’. The Assay structure has some nuances (see discussion below), but there are functions that get the assay data out for you.

By default this object will return the normalised data (from the only assay in this object, called RNA). Every ‘.’ is a zero.

GetAssayData(pbmc_processed)[1:15,1:2]
#> 15 x 2 sparse Matrix of class "dgCMatrix"
#>               AAACATACAACCAC-1 AAACATTGAGCTAC-1
#> AL627309.1                   .         .       
#> AP006222.2                   .         .       
#> RP11-206L10.2                .         .       
#> RP11-206L10.9                .         .       
#> LINC00115                    .         .       
#> NOC2L                        .         .       
#> KLHL17                       .         .       
#> PLEKHN1                      .         .       
#> RP11-54O7.17                 .         .       
#> HES4                         .         .       
#> RP11-54O7.11                 .         .       
#> ISG15                        .         .       
#> AGRN                         .         .       
#> C1orf159                     .         .       
#> TNFRSF18                     .         1.625141

But the raw counts data is accessible too.

GetAssayData(pbmc_processed, slot='counts')[1:15,1:2]
#> 15 x 2 sparse Matrix of class "dgCMatrix"
#>               AAACATACAACCAC-1 AAACATTGAGCTAC-1
#> AL627309.1                   .                .
#> AP006222.2                   .                .
#> RP11-206L10.2                .                .
#> RP11-206L10.9                .                .
#> LINC00115                    .                .
#> NOC2L                        .                .
#> KLHL17                       .                .
#> PLEKHN1                      .                .
#> RP11-54O7.17                 .                .
#> HES4                         .                .
#> RP11-54O7.11                 .                .
#> ISG15                        .                .
#> AGRN                         .                .
#> C1orf159                     .                .
#> TNFRSF18                     .                2

Seurat generally hides alot of this data complexity, and provides functions for typical tasks. Like plotting.

3.3 Plotting

Lets plot a classic UMAP with DimPlot- defaults to the clusters on a ‘umap’ view.

DimPlot(pbmc_processed)
# equivalent to
#DimPlot(pbmc_processed, reduction = 'umap')

What about checking some gene expression? Genes are called ‘features’ in a Seurat object (because if it was CITE-seq they’d be proteins!). So the FeaturePlot() function will show gene expression.

FeaturePlot(pbmc_processed, features =  "CD14")

It also works for continuous cell-level data - any column in the metadata table.

FeaturePlot(pbmc_processed, features =  "percent.mt")

And what about showing CD14 expression across which clusters (or any other categorical information in the metadata)

VlnPlot(pbmc_processed, features = 'CD14')

Challenge: Plotting

Plot your favourite gene. What cluster is it found in?

Tip: You can check if the gene is in the dataset by looking for it in the rownames of the seurat object "CCT3" %in% rownames(pbmc_processed)


Discussion: The Seurat Object in R

Lets take a look at the seurat object we have just created in R, pbmc_processed

To accomodate the complexity of data arising from a single cell RNA seq experiment, the seurat object keeps this as a container of multiple data tables that are linked.

The functions in seurat can access parts of the data object for analysis and visualisation, we will cover this later on.

There are a couple of concepts to discuss here.
Class

These are essentially data containers in R as a class, and can accessed as a variable in the R environment.

Classes are pre-defined and can contain multiple data tables and metadata. For Seurat, there are three types.

  • Seurat - the main data class, contains all the data.
  • Assay - found within the Seurat object. Depending on the experiment a cell could have data on RNA, ATAC etc measured
  • DimReduc - for PCA and UMAP
Slots

Slots are parts within a class that contain specific data. These can be lists, data tables and vectors and can be accessed with conventional R methods.

Data Access

Many of the functions in Seurat operate on the data class and slots within them seamlessly. There maybe occasion to access these separately to hack them, however this is an advanced analysis method.

The ways to access the slots can be through methods for the class (functions) or with standard R accessor nomenclature.

Examples of accessing a Seurat object

The assays slot in pbmc_processed can be accessed with pbmc_processed@assays.

The RNA assay can be accessed from this with pbmc_processed@assays$RNA.

We often want to access assays, so Seurat nicely gives us a shortcut pbmc_processed$RNA. You may sometimes see an alternative notation pbmc_processed[["RNA"]].

In general, slots that are always in an object are accessed with @ and things that may be different in different data sets are accessed with $.

Have a go

Use str to look at the structure of the Seurat object pbmc_processed.

What is in the meta.data slot within your Seurat object currently? What type of data is contained here?

Where is our count data within the Seurat object?

We will be jumping between that page - for the original content, and here - for extra tips, tricks and challenges.