Explore the dataset

The study
The data object
The existing annotation

The study

This workshop will use CosMx Spatial Molecular Imager (SMI) data from the paper “Macrophage and neutrophil heterogeneity at single-cell spatial resolution in human inflammatory bowel disease” (Garrido-Trigo et al. 2023) - link: https://www.nature.com/articles/s41467-023-40156-6

There were 9x colon tissue samples, one per slide. They used a 1k RNA panel panel, (5k Xenium, 6k Cosmx and whole transcriptome kits are also out there).

3x Healthy controls (HC)
3x Crohn’s disease (CD)
3x Ulcerative colitis (UC)

For this workshop, we will work with a subsetted dataset;

Only the Healthy and Crohn’s disease samples - 6x samples total
Only a regional subset of each sample - first 4 ‘FOVs’ (Fields Of View - CosMx scan areas)

The data object

This is a SpatialFeatureExperiment object. (Visit https://pachterlab.github.io/SpatialFeatureExperiment/articles/SFE.html for more information on this class.)

This subsetted dataset has 999 genes and 65601 cells.

sfe

## class: SpatialFeatureExperiment 
## dim: 999 65601 
## metadata(0):
## assays(3): counts molecules logcounts
## rownames(999): AATK ABL1 ... NegPrb22 NegPrb23
## rowData names(3): target CodeClass hvg
## colnames(65601): HC_a_1000_1 HC_a_1000_2 ... CD_c_99_4 CD_c_9_2
## colData names(40): fov cell_ID ... clust_M0_lam0.6_k50_res0.3 niche
## reducedDimNames(2): PCA UMAP
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : CenterX_global_px CenterY_global_px
## imgData names(4): sample_id image_id data scaleFactor
## 
## unit: full_res_image_pixel
## Geometries:
## colGeometries: centroids (POINT), cellSeg (POLYGON) 
## 
## Graphs:
## GSM7473682_HC_a: 
## GSM7473683_HC_b: 
## GSM7473684_HC_c: 
## GSM7473688_CD_a: 
## GSM7473689_CD_b: 
## GSM7473690_CD_c:

It has the following metadata for each cell.

DT::datatable(as.data.frame(head(colData(sfe), n=300)))

This was a ‘1k’ panel, we have 999 targets.

DT::datatable(as.data.frame(rowData(sfe)))

How many cells per sample?

table(sfe$tissue_sample)

## 
##  CD_a  CD_b  CD_c  HC_a  HC_b  HC_c 
##  6723 13150 12929  8225 15642  8932

This data is subsetted to only 4 ‘Fields of View’ (FOV) per sample. On the CosMx platform, these are multiple rectangular regions that make up the run. Essentially we’re looking at a corner of each sample.

#NB: ColData is a DataFrame, not a data.frame, often need an explicit conversion
colData(sfe) %>% as.data.frame() %>% select(group,tissue_sample, fov, fov_name) %>%
  group_by(group,tissue_sample, fov, fov_name) %>% 
  summarise(n_cells = n())

## `summarise()` has grouped output by 'group', 'tissue_sample', 'fov'. You can
## override using the `.groups` argument.

The existing annotation

We’ll be using 5 broad cell types. These are from Garrido-Trigo et al’s original paper.

plotUMAP(sfe, colour_by='celltype_subset', scattermore=1)

Lets check them out on the actual tissue, one of the healthy control samples.

A note on the tissue morphology: Here the top would be the lumen of the colon, and a stromal layer at the bottom known as the lamina propria. The oval-shaped epithelial structures are crypts. See: https://www.pathologyoutlines.com/topic/colonhistology.html

plotSpatialFeature(sfe.sample.HC, 'celltype_subset', colGeometryName = "cellSeg") + 
  theme(legend.title=element_blank()) +
  ggtitle(sample)

There are multiple levels of cell type annotation in this dataset.

There is the very (very) detailed celltype_singleR2, used for various analyses in the original paper;

# Remove cell types with less than 30 instances, purely for plotting.
cell_counts <- table(sfe.sample.HC$celltype_SingleR2)
sfe.sample.HC$filtered_celltype_singleR2 <- as.character(ifelse(cell_counts[as.character(sfe.sample.HC$celltype_SingleR2)] >= 30 , as.character(sfe.sample.HC$celltype_SingleR2), "Other"))

plotSpatialFeature(sfe.sample.HC, 'filtered_celltype_singleR2', colGeometryName = "cellSeg") + 
  theme(legend.title=element_blank()) +
  ggtitle(sample)

And some unlabelled clusters generated purely on transcriptional similarity. These might represent a nice level of classification if were were doing the analysis from scratch.

plotSpatialFeature(sfe.sample.HC, 'cluster_code', colGeometryName = "cellSeg") + 
  theme(legend.title=element_blank()) +
  ggtitle(sample)

Of course, we can plot any gene’s expression (so long as its present on the 1k panel!)

plotSpatialFeature(sfe.sample.HC, 'PIGR', colGeometryName = "cellSeg") + 
  theme(legend.title=element_blank()) +
  ggtitle(sample)