What is the difference between Crohn’s Disease and healthy samples in different regions of the tissue?
There are clear ‘regions’ in the tissue, e.g. a lymphoid region full of T cells and B cells. Or the very outer layer of epithelial, or the basal layer or stromal cells. We can identify and test for differences within these ‘niches’.
plotSpatialFeature(sfe.sample.HC,'celltype_subset', colGeometryName = "cellSeg") +
theme(legend.title=element_blank()) +
ggtitle(sample)
We can use BANKSY to define niches. Rather than using cell types, BANKSY uses three pieces of information: each cell’s gene expression, its surrounding gene expression and a ‘azimuthal gabor filter’ (AGF) to capture the surrounding gene expression gradient - see Figure 1.
Again, this takes some time to run. The results are already loaded in the sfe object.
# ## DO NOT RUN ##
#
# # Lambda defines how much BANKSY should consider the gene expression surrounding a cell
# # O uses only *this* cell. 0.8 is recommended for finding domains (niches).
# lambdas <- c(0.6)
#
# # Resolution of clustering step
# # Higher values = more niches.
# resolutions <- c(0.3, 0.5)
#
# # Try to limit to just highly variable genes
# hvgs <- rownames(rowData(sfe))[rowData(sfe)$hvg]
# sfe.hvg <- sfe[hvgs,]
# sfe.hvg$tissue_sample<- droplevels(sfe.hvg$tissue_sample)
#
# # Split by sample
# get_one_sample <- function(the_sample){sfe.hvg[,sfe.hvg$tissue_sample == the_sample]}
# sfe.list <- lapply(FUN=get_one_sample, X=levels(sfe.hvg$tissue_sample))
#
# # Get the underlying BANKSY computation
# sfe.list <- lapply(sfe.list, computeBanksy, assay_name = 'logcounts', k_geom = k_geom)
# sfe.merged <- do.call(cbind, sfe.list)
# sfe.merged <- Banksy::runBanksyPCA(sfe.merged, lambda = lambdas, npcs = npcs, group='tissue_sample', seed=12)
# sfe.merged <- Banksy::clusterBanksy(sfe.merged, lambda = lambdas, npcs = npcs, resolution = resolutions, seed=12)
#
# niches_table <- colData(sfe.merged)
# niche_lookup <- setNames(niches_table$clust_M0_lam0.6_k50_res0.5, nm=as.character( niches_table$cell ))
# niche_lookup2 <- setNames(niches_table$clust_M0_lam0.6_k50_res0.3, nm=as.character( niches_table$cell ))
# sfe$clust_M0_lam0.6_k50_res0.5 <- niche_lookup[as.character(sfe$cell)]
# sfe$clust_M0_lam0.6_k50_res0.3 <- niche_lookup2[as.character(sfe$cell)]
# sfe$niche <- factor(paste0('n', as.character(sfe$clust_M0_lam0.6_k50_res0.3)),
# levels=paste0('n',levels(sfe$clust_M0_lam0.6_k50_res0.3)))
See clusters assigned at different resolutions and the niche assigned using 0.3 resolution.
head(colData(sfe), n=100) %>% as_tibble() %>%
select(cell, tissue_sample, celltype_subset, clust_M0_lam0.6_k50_res0.5, clust_M0_lam0.6_k50_res0.3, niche )
Observe niches plotted on the healthy tissue.
plotSpatialFeature(sfe.sample.HC, "niche",colGeometryName = "cellSeg") +
theme(legend.title=element_blank()) +
ggtitle(sample)
Compare with one of the Crohn’s samples.
plotSpatialFeature(sfe.sample.CD,'celltype_subset', colGeometryName = "cellSeg") +
theme(legend.title=element_blank()) +
ggtitle(sample)
plotSpatialFeature(sfe.sample.CD, "niche",colGeometryName = "cellSeg") + # Choose this one
theme(legend.title=element_blank()) +
ggtitle(sample)
What cell types make up the niches?
Can see that;
## Cell type proportions
celltype_summary_table<- colData(sfe) %>% as_tibble() %>%
group_by(celltype_subset, niche ) %>%
summarise(n_cells = n())
## `summarise()` has grouped output by 'celltype_subset'. You can override using
## the `.groups` argument.
ggplot(celltype_summary_table, aes(fill=celltype_subset, y=n_cells, x=niche) )+
geom_bar(position="fill", stat="identity") +
theme_bw() +
coord_flip() +
theme(legend.position = "bottom") +
scale_y_continuous(expand = c(0,0)) +
ggtitle( "Niche composition by cell type")
Likewise, its helpful to see the proportion of niche types across different samples. This can be quite variable - particularly when we’re working on a particularly small region of each sample!
## Niche proportions
celltype_summary_table<- colData(sfe) %>% as_tibble() %>%
group_by(tissue_sample, niche ) %>%
summarise(n_cells = n())
## `summarise()` has grouped output by 'tissue_sample'. You can override using the
## `.groups` argument.
ggplot(celltype_summary_table, aes(fill=niche, y=n_cells, x=tissue_sample) )+
geom_bar(position="fill", stat="identity") +
theme_bw() +
coord_flip() +
theme(legend.position = "bottom") +
scale_y_continuous(expand = c(0,0)) +
ggtitle( "Sample composition by niche")
Like when conducting differential expression by cell type, we need to create a cell-level annotation for the pseudobulk grouping in our analysis, but this time by niche. We do this by concatenating the tissue sample ID with the niche; here we call this column pdb_sample.
The aggregateAcrossCells() function from the scuttle package is used to sum up each gene’s expression across each pdb_sample.
This will build a SummarizedExperiment object with one entry per pdb_sample combined sample-niche group. Each entry contains the sum of copies of the gene for all cells within the group.
This can take a while to run - we will load one that’s already been preprocessed.
# # DO NOT RUN
# sfe$pdb_sample <- paste0(sfe$tissue_sample, '_', sfe$niche)
# se.pdb <- aggregateAcrossCells(sfe, ids=sfe$pdb_sample,
# BPPARAM = MulticoreParam(workers=4) )
Load preprocessed pseudobulk
se.pdb <- readRDS(spe_pseudobulk_by_niche_file)
counts(se.pdb)[1:5,1:5]
## CD_a_n1 CD_a_n2 CD_a_n3 CD_a_n4 CD_a_n5
## AATK 22 141 123 70 200
## ABL1 46 208 139 87 319
## ABL2 19 130 109 64 131
## ACE 13 123 52 44 83
## ACE2 33 242 168 90 172
How many cell per pool?
DT::datatable(data.frame(colData(se.pdb))[,c('pdb_sample','individual_code','tissue_sample','niche','ncells')])
We will pick a threshold for ‘minimum cells per pseudobulk sample’ based on the distribution of the data. A pool of 2 or 5 cells would be too small to reasonably test.
# What is the minimum acceptable number of cells in a pool?
min_cells_per_pdbsample <- 100
ggplot(colData(se.pdb), aes(x=ncells, col=niche)) +
geom_density() +
geom_vline(xintercept=min_cells_per_pdbsample, lty=3) +
geom_rug() +
scale_x_log10() +
theme_bw() +
ggtitle("How many cells per pseduobulk sample?")
Here we can see that 100 cells is a good cutoff for all niches.
Apply filter.
# Filter on minimum number of cells per pdb group
# (ncells column was added by aggregateAcrossCells)
se.pdb <- se.pdb[,se.pdb$ncells >= min_cells_per_pdbsample]
Differential expression calculated exactly as per cell type, just with niche.
# Pull out the pseudobulk counts matrix for passed samples
pseudobulk_counts_matrix <- counts(se.pdb)
# And the corresponding annotations
pseudobulk_anno_table <- as_tibble(colData(se.pdb)[,c('pdb_sample','group', 'tissue_sample', 'niche', 'ncells')]) # + any other experimental factors
# Build a table of each contrast we might want to do.
contrasts_wanted <- bind_cols(
A= c("CD"), # First terms
B= c("HC") # Second terms (usually control)
)
We won’t be able to test niches 6 or 8. But the below code will test for the number of available conditions per side, and skip them automatically.
table(pseudobulk_anno_table$niche, pseudobulk_anno_table$group)
##
## HC CD
## n1 3 3
## n2 3 3
## n3 3 3
## n4 3 3
## n5 3 2
## n6 3 1
## n7 3 2
## n8 0 1
# Empty list to collect results
de_result_list <- list()
## Cycle through each niche
for (the_niche in levels(se.pdb$niche)) {
# Subset pseudobulk object to one niche
se.pdb.this <- se.pdb[,se.pdb$niche == the_niche]
# And pull out the annotation and counts
anno_table.this <- as_tibble(colData(se.pdb.this))
count_matrix.this <- counts(se.pdb.this)
## Check for sufficient replicates ##
# To do any calculations, we need at least 2 pseudobulk groups per contrast.
# There are plenty in this experiment, but with less replicates and rare cell types
# It's very common to have to skip some contrasts for some niches.
# Skip clusters with no samples after filtering
if( nrow(anno_table.this) < 1 ) {next}
# Count how many biological replicates per group (only need one FOV to count it)
biosample_per_group <- anno_table.this %>%
select(group, tissue_sample) %>%
pull(group) %>% # pull out group column, one entry per tissue sample
table() # count how many
# Are there enough biological replicates to consider the contrast?
min_biosample_per_group <- 2
enough_biosamples <-
(unname(biosample_per_group[contrasts_wanted$A]) >= min_biosample_per_group) &
(unname(biosample_per_group[contrasts_wanted$B]) >= min_biosample_per_group)
# Make a new table with contrasts that we have enough biological replicates for.
contrasts_to_test <- contrasts_wanted[enough_biosamples,]
# Skip if no contrasts are testable
if (nrow(contrasts_to_test) == 0) {next}
## Setup model
# Setup objects for limma
dge <- DGEList(count_matrix.this)
dge <- calcNormFactors(dge)
# Build model
group <- anno_table.this$group
# Model design
# Add other experimental factors here
# ~0 + group
# ~0 + group + individual
# ~0 + group + individual + slide
design <- model.matrix( ~0 + group)
# Run voom
vm <- voom(dge, design = design, plot = FALSE)
# Fit model
fit <- lmFit(vm, design)
# Define and fit contrasts and run eBayes
# Doing this in an automated way so we can include/exclude contrasts where
# there are/aren't enough replicates available.
# The list of strings generated here is parsed by the makeContrasts function
# to build a contrasts matrix.
contrast_str_list <- paste0("group",contrasts_to_test$A,"-","group",contrasts_to_test$B)
contrasts <- makeContrasts(contrasts=contrast_str_list,
levels=coef(fit))
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)
## Loop through contrasts.
# You can run multiple contrasts at once, but doing it this way allows us
# to skip individual contrasts within a niche.
for ( the_coef in colnames(contrasts) ) {
# Make a version of the coefficient name that doesn't include a '-', or the
# factor name (group)
# Life is easier without special characters
# groupUC-groupHC => UCvHC
contrast_name <- gsub("group","", gsub("-","v",the_coef))
de_result.this <- topTable(fit, n = Inf, adjust.method = "BH", coef = the_coef) %>%
rownames_to_column("target") %>%
mutate(contrast=contrast_name,
contrast_group="pairwise",
niche=the_niche) %>%
select(niche,contrast_group, contrast,target,everything()) %>%
arrange(P.Value)
# Build a unique name for this result by adding <niche>_<A>v<B>
de_result_list[[paste(the_niche, contrast_name, sep="_")]] <- de_result.this
}
}
## Only 1 column left in sample GSM7473688_CD_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473689_CD_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473690_CD_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473682_HC_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473683_HC_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473684_HC_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473688_CD_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473689_CD_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473690_CD_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473682_HC_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473683_HC_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473684_HC_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473688_CD_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473689_CD_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473690_CD_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473682_HC_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473683_HC_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473684_HC_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473688_CD_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473689_CD_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473690_CD_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473682_HC_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473683_HC_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473684_HC_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473688_CD_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473689_CD_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473682_HC_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473683_HC_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473684_HC_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473690_CD_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473682_HC_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473683_HC_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473684_HC_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473688_CD_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473690_CD_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473682_HC_a, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473683_HC_b, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473684_HC_c, graphs are meaningless, dropping graphs.
## Only 1 column left in sample GSM7473689_CD_b, graphs are meaningless, dropping graphs.
# Join together results for all niches, and pull out those with a significant adjusted p-value
de_results_all.niche <- bind_rows(de_result_list)
de_results_sig.niche <- filter(de_results_all.niche, adj.P.Val < 0.05)
DT::datatable(de_results_sig.niche)
Again, make some normalised expression just for plotting.
# Get some normalised values, purely for plotting.
# (can't use logged counts because the sizes are so dramatically different!)
dge <- DGEList(counts(se.pdb))
dge <- calcNormFactors(dge)
norm_counts <- cpm(dge)
# Store normalised log2 scale expression in
assay(se.pdb, "logcounts") <- log2(norm_counts)
IGHG1 and IGHG2 (heavy chain immunoglobulins) are behaving very similarly, but the change seems to be fairly global.
p1 <- plotExpression(se.pdb, "IGHG2", x="group", colour_by = "group",
other_fields = "niche") +
ggtitle("IGHG2") +
facet_wrap(~niche)
p2 <- plotExpression(se.pdb, "IGHG1", x="group", colour_by = "group",
other_fields = "niche") +
ggtitle("IGHG1") +
facet_wrap(~niche)
p1 + p2
## Warning: Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Warning: Computation failed in `stat_ydensity()`.
## Caused by error in `$<-.data.frame`:
## ! replacement has 1 row, data has 0
## Warning: Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Warning: Computation failed in `stat_ydensity()`.
## Caused by error in `$<-.data.frame`:
## ! replacement has 1 row, data has 0
Just showing IGHG2; Suddenly Niche 1 makes sense!
p1 <- plotSpatialFeature(sfe.sample.HC, "IGHG2",colGeometryName = "cellSeg") +
theme(legend.title=element_blank()) +
ggtitle(sample) +
ggtitle("IGHG2 - HC_b")
p2 <- plotSpatialFeature(sfe.sample.CD, "IGHG2",colGeometryName = "cellSeg") +
theme(legend.title=element_blank()) +
ggtitle(sample) +
ggtitle("IGHG2 - CD_a")
p1 / p2
Could a lack of WNT2B be relevant? https://pubmed.ncbi.nlm.nih.gov/38697357/
plotExpression(se.pdb, "WNT2B", x="group", colour_by = "group",
other_fields = "niche") +
ggtitle("WNT2B") +
facet_wrap(~niche)
## Warning: Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Groups with fewer than two datapoints have been dropped.
## i Set `drop = FALSE` to consider such groups for position adjustment purposes.
## Warning: Computation failed in `stat_ydensity()`.
## Caused by error in `$<-.data.frame`:
## ! replacement has 1 row, data has 0
p1 <- plotSpatialFeature(sfe.sample.HC, "WNT2B",colGeometryName = "cellSeg") +
theme(legend.title=element_blank()) +
ggtitle(sample) +
ggtitle("WNT2B - HC_b")
p2 <- plotSpatialFeature(sfe.sample.CD, "WNT2B",colGeometryName = "cellSeg") +
theme(legend.title=element_blank()) +
ggtitle(sample) +
ggtitle("WNT2B - CD_a")
p1 / p2