Last updated: 2024-08-23
Checks: 7 0
Knit directory: spatialsnippets/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20231017)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 5cae056. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/e_neighbourcellchanges.nb.html
Ignored: analysis/glossary.nb.html
Ignored: renv/library/
Ignored: renv/staging/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/e_CompositionChange.Rmd
)
and HTML (docs/e_CompositionChange.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 5cae056 | swbioinf | 2024-08-23 | wflow_publish(c("analysis/e_CompositionChange.Rmd")) |
html | 53dff19 | swbioinf | 2024-08-23 | Build site. |
Rmd | 46145cf | swbioinf | 2024-08-23 | wflow_publish(c("analysis/e_CompositionChange.Rmd")) |
html | 4d21ff2 | swbioinf | 2024-05-31 | Build site. |
Rmd | 29b171b | swbioinf | 2024-05-31 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | a6dd41e | swbioinf | 2024-05-31 | Build site. |
Rmd | 46429ac | swbioinf | 2024-05-31 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | 6978154 | swbioinf | 2024-05-30 | Build site. |
Rmd | f14901b | swbioinf | 2024-05-30 | wflow_publish(c("analysis/e_DEPseudobulk_insitu.Rmd", "analysis/e_CompositionChange.Rmd", |
html | 0af0a41 | swbioinf | 2024-05-08 | Build site. |
Rmd | 3b66014 | swbioinf | 2024-05-08 | wflow_publish(c("analysis/e_CompositionChange.Rmd")) |
html | cfa2539 | swbioinf | 2024-05-07 | Build site. |
Rmd | a258ba3 | swbioinf | 2024-05-07 | wflow_publish(c("analysis/e_CompositionChange.Rmd")) |
html | 03a38b8 | swbioinf | 2024-05-07 | Build site. |
Rmd | 469bf34 | swbioinf | 2024-05-07 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | 3a1bdf3 | swbioinf | 2024-05-07 | Build site. |
Rmd | 5e7e7fb | swbioinf | 2024-05-07 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | 4eb5dd9 | swbioinf | 2024-05-03 | Build site. |
Rmd | 49d07d2 | swbioinf | 2024-05-03 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | 2441c2d | swbioinf | 2024-04-23 | Build site. |
Rmd | 96fcd96 | swbioinf | 2024-04-23 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | 22b7d61 | swbioinf | 2024-04-22 | Build site. |
Rmd | 4701a50 | swbioinf | 2024-04-22 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | a722f36 | swbioinf | 2024-04-18 | Build site. |
Rmd | 1dc6730 | swbioinf | 2024-04-18 | wflow_publish(c("analysis/index.Rmd", "analysis/e_CompositionChange.Rmd", |
html | 8423279 | swbioinf | 2024-04-17 | Build site. |
Rmd | 048c672 | swbioinf | 2024-04-17 | wflow_publish("analysis/e_CompositionChange.Rmd") |
html | 456dd2f | swbioinf | 2024-04-10 | Build site. |
Rmd | 8ef9cd9 | swbioinf | 2024-04-10 | wflow_publish("analysis/") |
html | 30da140 | Sarah Williams | 2024-03-22 | Build site. |
Rmd | 89c3371 | Sarah Williams | 2024-03-22 | wflow_publish(c("analysis/index_data.Rmd", "analysis/index.Rmd", |
With single cell spatial data, the frequency of each cell type can be tested between two different groups.
Testing changes in proportions like this can be challenging, because an expansion in one cell type, will mean a reduction in all other cell types.
This document describes how to use the propeller (Phipson et al. 2022) method from the speckle R package, and limma to test for differences in cell type proportions. This takes into account this linked relationship between cell type proportions, and can incorporate pseudoreplicate data from multiple FOVs (fields of view) from an insitu single cell spatial dataset.
This requires:
For example:
Using data from (Garrido-Trigo et al. 2023)
Is there a difference in the celltype composition between individuals with Ulcerative colitis or Crohn’s disease, and Healthy controls?
library(Seurat)
library(speckle)
library(tidyverse)
library(limma)
library(DT)
data_dir <- file.path("~/projects/spatialsnippets/datasets/GSE234713_IBDcosmx_GarridoTrigo2023/processed_data")
seurat_file_01_loaded <- file.path(data_dir, "GSE234713_CosMx_IBD_seurat_01_loaded.RDS")
so <- readRDS(seurat_file_01_loaded)
There are three individuals per condition (one tissue sample from each individual). With multiple fovs on each physical tissue sample.
sample_table <- select(as_tibble(so@meta.data), condition, individual_code, fov_name) %>%
unique() %>%
group_by(condition, individual_code) %>%
summarise(n_fovs= n(), item = str_c(fov_name, collapse = ", "))
DT::datatable(sample_table)
In this dataset there are actually two different levels of categorisation;
Lets check the numbers of cells per category for both categories, to decide which we can use. We’d like to look at the celltype proportions at the most specific resolution - but we need to ensure there are enough cells of each type for meaningful statistics.
In the ‘Celltype_subset’ column, there are just 5 broad categories;
celltype_summary_table <- so@meta.data %>%
group_by(condition, group, individual_code, fov_name, celltype_subset) %>%
summarise(cells=n(), .groups = 'drop')
DT::datatable(celltype_summary_table)
There are many different types in ‘celltype_SingleR2’ - which is typical if you’re using celltype assignment with a detailed reference.
celltype_summary_table.SingleR <- so@meta.data %>%
group_by(condition, group, individual_code, fov_name, celltype_SingleR2) %>%
summarise(cells=n(), .groups = 'drop')
DT::datatable(celltype_summary_table.SingleR)
For both categories, plot how many we see per FOV on average.
In the ‘celltype_subset’ category, T cells are rare, but there are still a decent distribution of them with 10-100+ cells in a FOV. We should be able to see changes in these.
ggplot(celltype_summary_table, aes(x=cells, col=celltype_subset)) +
geom_density() +
geom_rug(alpha=0.2) +
scale_x_log10() +
theme_bw() +
ggtitle("Cells per FOV by celltype")
On the other hand, could we use the more fine-grained categorisation in the ‘celltype_SingleR2’ grouping?
In this case, there are just too many cell type categories, and we should stick with the broad categorisation.
ggplot(celltype_summary_table.SingleR, aes(x=cells, col=celltype_SingleR2)) +
geom_density() +
geom_rug(alpha=0.2) +
scale_x_log10() +
theme_bw() +
ggtitle("Cells per FOV by celltype")
You might still want to use a more specific grouping. You might be able to tweak your groups to make this happen.
Some possible approaches:
The more cell types you have, the more aggressive your FDR multiple hypothesis correction will need to be. Its best to remove or condense cell types that can never reach statistical significance.
In a RNAseq experiment, you would typically have one set of measurements per biological sample. In a single cell RNAseq experiment you’d typically have one group of cells measured for that same sample. With the current crop of spatially resolved single cell technologies (e.g. cosmx), you can have multiple FOVs (feilds of view) on the same physical chunk of tissue. These are not true biological replicates and can be considered ‘pseudoreplicates’.
Pseudoreplicates can still have a degree of heterogeneity, from different regions of the tissue. E.g. some samples might overlap epithelial regions more than others.
Lets plot the cell type composition across all the pseodureplicates, grouped by replicate, and grouped by condition. Where there are more samples or uneven numbers, it might be helpful to plot each condition separately.
ggplot(celltype_summary_table, aes(x=fov_name, y=cells, fill=celltype_subset)) +
geom_bar(position="fill", stat="identity") +
theme_bw() +
coord_flip() +
theme(legend.position = "bottom") +
facet_wrap(~individual_code, ncol=3, scales = 'free_y') +
scale_y_continuous(expand = c(0,0))
If there are obvious changes in cell type proportions, they should be visible now!
Now we can formally test for differences.
For the simple case where there are no fov pseudoreplicats (e.g a
single cell RNAseq experiment), there is a propellar
function. This approach is described in the speckle
vignette
# Without FOV replicates - Not what we want.
results_table.no_fov_info <- propeller(clusters = so$celltype_subset,
sample = so$individual_code,
group = so$condition)
DT::datatable(results_table.no_fov_info)
It would be possible to counts cells at the sample level, and use the above method. But - having seen the within sample heterogeneity across multiple fovs, we want to take that into account. This takes few more steps, using the limma (Ritchie et al. 2015) duplicateCorrealation approach described in the speckle vignette here
Calculate the transformed proportion that speckle needs on each fov.
props <- getTransformedProps(so$celltype_subset, so$fov_name, transform="logit")
Then use limma to test for differences.
If you are not familiar with limma, or you want more info on building your own ‘design’ or ‘contrasts’ there are many online resources that cover it. For example the substantial limma documentation, or online tutorials such as this one or any number of forum posts. The instructions for RNAseq or microarray differential expression tests can be applied to this proportional data. The difference is that we provide the proportions.
# The fovs as ordered in props
fov_name <- colnames(props$TransformedProps)
# Extract the other information in the same order.
# Note we're using the group column with simple names (entries like 'CD', 'HC') rather than the condition column ("Chron's Disease", "Healthy controls").
# This is because differential tests with limma doesn't handle spaces and special characters (like ',- ) very well at all.
props_order <- match(fov_name, celltype_summary_table$fov_name)
clusters <- celltype_summary_table$celltype_subset[props_order]
sample <- celltype_summary_table$individual_code[props_order]
group <- celltype_summary_table$group[props_order]
# Build the design matrix
# This simple one considers only the disease.
# and includes the 0+ term to avoid setting one group to our baseline, which in turn make building contrasts more intuative.
design <- model.matrix( ~ 0+ group)
# Calculate duplicate correlation, within fovs for different individuals (sample)
dupcor <- duplicateCorrelation(props$TransformedProps, design=design, block=sample)
# Next fit the model to your data; using the experimental design, blocking on the individual/sample, and providing the 'consensus' correlation value from duplicate correlation.
fit <- lmFit(props$TransformedProps, design=design, block=sample, correlation=dupcor$consensus)
# These are the groups from the model (not the prefixed 'group')
colnames(fit)
[1] "groupCD" "groupUC" "groupHC"
# Use those names to define any relevant contrasts to test
# Here there are 2: Ucerative colitis vs healthy controls and Chron's disease vs healthy controls.
contrasts <- makeContrasts(UCvHC = groupUC - groupHC,
CDvHC = groupCD - groupHC,
levels=coef(fit))
# Then fit contrasts and run ebayes
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)
Then, to see differnences in proportions for the Ulcerative Colitis vs Healthy control (‘UCvsHC’) test
results_table.UCvsHC <- topTable(fit, coef='UCvHC')
DT::datatable(results_table.UCvsHC)
With multiple fovs per tissue sample handled as pseudoreplicates
library(limma)
library(speckle)
# condition : Experimental grouping
# fov_name : An unique fov identifier
# individual_code : individual (sample)
# cluster : cell type
props <- getTransformedProps(so$cluster, so$fov_name, transform="logit")
# Make a table of relevant sample information, in same order as props, and check.
sample_info_table <- unique( select(so@meta.data, fov_name, condition, individual_code) )
row_order <- match(colnames(props$TransformedProps),sample_info_table$fov_name)
sample_info_table <- sample_info_table[row_order,]
stopifnot(all(sample_info_table$fov_name == colnames(props$TransformedProps))) # check it
# Extract relevant factors in same order as props
sample <- sample_info_table$individual_code
condition <- sample_info_table$condition
design <- model.matrix( ~ 0 + condition)
dupcor <- duplicateCorrelation(props$TransformedProps, design=design, block=sample)
fit <- lmFit(props$TransformedProps, design=design, block=sample, correlation=dupcor$consensus)
# Contrast called 'test', measuring of test condition vs Control condition.
contrasts <- makeContrasts(test = conditionTest - conditionCon, levels = coef(fit))
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)
results_table <- topTable(fit, coef='test')
results_table.UCvsHC
logFC AveExpr t P.Value adj.P.Val B
epi -2.70672270 -2.4412454 -1.647504165 0.1012925 0.5064627 -4.576517
myeloids 0.49641121 -2.3157873 1.188049751 0.2364629 0.5911571 -4.590716
plasmas 0.58033621 -1.0902075 0.664839066 0.5070499 0.8450832 -4.601420
tcells -0.10889876 -4.0778756 -0.165741871 0.8685561 0.9940255 -4.606033
stroma 0.00644335 -0.9671844 0.007498968 0.9940255 0.9940255 -4.606339
This table is the typical output of limma tests;
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] DT_0.33 limma_3.58.1 lubridate_1.9.3 forcats_1.0.0
[5] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5
[9] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0
[13] speckle_1.2.0 Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-3
[17] workflowr_1.7.1
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.2
[3] later_1.3.2 bitops_1.0-7
[5] polyclip_1.10-6 fastDummies_1.7.3
[7] lifecycle_1.0.4 edgeR_4.0.16
[9] rprojroot_2.0.4 globals_0.16.3
[11] processx_3.8.4 lattice_0.22-6
[13] MASS_7.3-60.0.1 crosstalk_1.2.1
[15] magrittr_2.0.3 plotly_4.10.4
[17] sass_0.4.9 rmarkdown_2.26
[19] jquerylib_0.1.4 yaml_2.3.8
[21] httpuv_1.6.15 sctransform_0.4.1
[23] spam_2.10-0 spatstat.sparse_3.0-3
[25] reticulate_1.35.0 cowplot_1.1.3
[27] pbapply_1.7-2 RColorBrewer_1.1-3
[29] abind_1.4-5 zlibbioc_1.48.2
[31] Rtsne_0.17 GenomicRanges_1.54.1
[33] BiocGenerics_0.48.1 RCurl_1.98-1.14
[35] git2r_0.33.0 GenomeInfoDbData_1.2.11
[37] IRanges_2.36.0 S4Vectors_0.40.2
[39] ggrepel_0.9.5 irlba_2.3.5.1
[41] listenv_0.9.1 spatstat.utils_3.0-4
[43] goftest_1.2-3 RSpectra_0.16-1
[45] spatstat.random_3.2-3 fitdistrplus_1.1-11
[47] parallelly_1.37.1 leiden_0.4.3.1
[49] codetools_0.2-20 DelayedArray_0.28.0
[51] tidyselect_1.2.1 farver_2.1.1
[53] matrixStats_1.2.0 stats4_4.3.2
[55] spatstat.explore_3.2-7 jsonlite_1.8.8
[57] progressr_0.14.0 ggridges_0.5.6
[59] survival_3.5-8 tools_4.3.2
[61] ica_1.0-3 Rcpp_1.0.12
[63] glue_1.7.0 gridExtra_2.3
[65] SparseArray_1.2.4 xfun_0.43
[67] MatrixGenerics_1.14.0 GenomeInfoDb_1.38.8
[69] withr_3.0.0 BiocManager_1.30.22
[71] fastmap_1.1.1 fansi_1.0.6
[73] callr_3.7.6 digest_0.6.35
[75] timechange_0.3.0 R6_2.5.1
[77] mime_0.12 colorspace_2.1-0
[79] scattermore_1.2 tensor_1.5
[81] spatstat.data_3.0-4 utf8_1.2.4
[83] generics_0.1.3 renv_1.0.5
[85] data.table_1.15.4 httr_1.4.7
[87] htmlwidgets_1.6.4 S4Arrays_1.2.1
[89] whisker_0.4.1 uwot_0.1.16
[91] pkgconfig_2.0.3 gtable_0.3.4
[93] lmtest_0.9-40 SingleCellExperiment_1.24.0
[95] XVector_0.42.0 htmltools_0.5.8.1
[97] dotCall64_1.1-1 scales_1.3.0
[99] Biobase_2.62.0 png_0.1-8
[101] knitr_1.45 rstudioapi_0.16.0
[103] tzdb_0.4.0 reshape2_1.4.4
[105] nlme_3.1-164 cachem_1.0.8
[107] zoo_1.8-12 KernSmooth_2.23-22
[109] parallel_4.3.2 miniUI_0.1.1.1
[111] pillar_1.9.0 grid_4.3.2
[113] vctrs_0.6.5 RANN_2.6.1
[115] promises_1.2.1 xtable_1.8-4
[117] cluster_2.1.6 evaluate_0.23
[119] cli_3.6.2 locfit_1.5-9.9
[121] compiler_4.3.2 rlang_1.1.3
[123] crayon_1.5.2 future.apply_1.11.2
[125] labeling_0.4.3 ps_1.7.6
[127] getPass_0.2-4 plyr_1.8.9
[129] fs_1.6.3 stringi_1.8.3
[131] viridisLite_0.4.2 deldir_2.0-4
[133] munsell_0.5.1 lazyeval_0.2.2
[135] spatstat.geom_3.2-9 Matrix_1.6-5
[137] RcppHNSW_0.6.0 hms_1.1.3
[139] patchwork_1.2.0 future_1.33.2
[141] statmod_1.5.0 shiny_1.8.1.1
[143] highr_0.10 SummarizedExperiment_1.32.0
[145] ROCR_1.0-11 igraph_2.0.3
[147] bslib_0.7.0