You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance references and update vignette for improved clarity and functionality
- Added multiple new references to the bibliography for comprehensive citation support.
- Updated the vignette to include a more structured approach to RNA-seq data analysis using the airway dataset.
- Improved code organization and styling for better readability and user experience.
- Enhanced explanations and added steps for data preparation, normalization, and analysis techniques.
- Ensured compatibility with tidyverse tools and improved overall clarity in the vignette content.
Copy file name to clipboardExpand all lines: references.bib
+43Lines changed: 43 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -192,4 +192,47 @@ @article{ritchie2015limma
192
192
pages={e47--e47},
193
193
year={2015},
194
194
publisher={Oxford University Press}
195
+
}
196
+
197
+
@article{aran2017xcell,
198
+
title={xCell: digitally portraying the tissue cellular heterogeneity landscape},
199
+
author={Aran, Dvir and Hu, Zicheng and Butte, Atul J},
200
+
journal={Genome biology},
201
+
volume={18},
202
+
number={1},
203
+
pages={220},
204
+
year={2017},
205
+
publisher={BioMed Central}
206
+
}
207
+
208
+
@article{becht2016mcp,
209
+
title={Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression},
210
+
author={Becht, Etienne and Giraldo, Nicolas A and Lacroix, Laetitia and Bifulco, Carlo and Buttard, B{\'e}n{\'e}dicte and Elarouci, Nabila and Petitprez, Florent and Selves, Janick and Laurent-Puig, Pierre and Saut{\`e}s-Fridman, Catherine and others},
211
+
journal={Genome biology},
212
+
volume={17},
213
+
number={1},
214
+
pages={218},
215
+
year={2016},
216
+
publisher={BioMed Central}
217
+
}
218
+
219
+
@article{finotello2019quantiseq,
220
+
title={Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data},
221
+
author={Finotello, Francesca and Mayer, Clemens and Plattner, Christina and Laschober, Gerhard and Rieder, Dietmar and Hackl, Hubert and Krogsdam, Anne and Loncova, Zuzana and Posch, Wilfried and Sopper, Sieghart and others},
222
+
journal={Genome medicine},
223
+
volume={11},
224
+
number={1},
225
+
pages={34},
226
+
year={2019},
227
+
publisher={BioMed Central}
228
+
}
229
+
230
+
@article{racle2017epic,
231
+
title={Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data},
232
+
author={Racle, Julien and de Jonge, Kaat and Baumgaertner, Petra and Speiser, Daniel E and Gfeller, David},
%\VignetteIndexEntry{Side-by-side comparison with standard interfaces}
20
-
%\usepackage[UTF-8]{inputenc}
21
-
18
+
%\VignetteEncoding{UTF-8}
22
19
---
23
20
24
21
@@ -33,13 +30,49 @@ vignette: >
33
30
<style>
34
31
.column-left{
35
32
float: left;
36
-
width: 50%;
33
+
width: 48%;
37
34
text-align: left;
35
+
margin-right: 2%;
38
36
}
39
37
.column-right{
40
38
float: right;
41
-
width: 50%;
42
-
text-align: right;
39
+
width: 48%;
40
+
text-align: left;
41
+
margin-left: 2%;
42
+
}
43
+
44
+
/* Improve code block styling */
45
+
.column-leftpre,
46
+
.column-rightpre {
47
+
font-size: 0.85em;
48
+
line-height: 1.3;
49
+
overflow-x: auto;
50
+
white-space: pre;
51
+
word-wrap: normal;
52
+
max-width: 100%;
53
+
}
54
+
55
+
/* Ensure code blocks don't get too narrow */
56
+
.column-leftprecode,
57
+
.column-rightprecode {
58
+
font-size: 0.8em;
59
+
line-height: 1.2;
60
+
}
61
+
62
+
/* Better spacing for code chunks */
63
+
.column-left.sourceCode,
64
+
.column-right.sourceCode {
65
+
margin-bottom: 1em;
66
+
}
67
+
68
+
/* Responsive design for smaller screens */
69
+
@media (max-width: 768px) {
70
+
.column-left,
71
+
.column-right {
72
+
width: 100%;
73
+
float: none;
74
+
margin: 001em0;
75
+
}
43
76
}
44
77
</style>
45
78
@@ -69,64 +102,107 @@ Mangiola, Stefano, Ramyar Molania, Ruining Dong, Maria A. Doyle, and Anthony T.
69
102
[Genome Biology - tidybulk: an R tidy framework for modular transcriptomic data analysis](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02233-7)
axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
99
133
axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))
100
134
)
135
+
```
136
+
137
+
In this vignette we will use the `airway` dataset, a `SummarizedExperiment` object containing RNA-seq data from an experiment studying the effect of dexamethasone treatment on airway smooth muscle cells. This dataset is available in the [airway](https://bioconductor.org/packages/airway/) package.
101
138
102
-
# Load airway dataset
139
+
```{r load airway}
140
+
library(airway)
103
141
data(airway)
142
+
```
143
+
144
+
This workflow, will use the [tidySummarizedExperiment](https://bioconductor.org/packages/tidySummarizedExperiment/) package to manipulate the data in a `tidyverse` fashion. This approach streamlines the data manipulation and analysis process, making it more efficient and easier to understand.
145
+
146
+
```{r load tidySummarizedExperiment}
147
+
library(tidySummarizedExperiment)
148
+
```
149
+
150
+
Here we will add a gene symbol column to the `airway` object. This will be used to interpret the differential expression analysis, and to deconvolve the cellularity.
104
151
105
-
# Add gene symbol and entrez for better annotation
tidybulk provide the `aggregate_duplicates` function to aggregate duplicated transcripts (e.g., isoforms, ensembl). For example, we often have to convert ensembl symbols to gene/transcript symbol, but in doing so we have to deal with duplicates. `aggregate_duplicates` takes a tibble and column names (as symbols; for `sample`, `transcript` and `count`) as arguments and returns a tibble with transcripts with the same name aggregated. All the rest of the columns are appended, and factors and boolean are appended as characters.
175
+
This vignette demonstrates how tidybulk compares to standard R/Bioconductor approaches for transcriptomic data analysis. We'll show the same analysis performed using both tidybulk (tidyverse-style) and traditional methods side by side.
176
+
177
+
## Data Overview
178
+
179
+
We will use the `airway` dataset, a `SummarizedExperiment` object containing RNA-seq data from an experiment studying the effect of dexamethasone treatment on airway smooth muscle cells:
180
+
181
+
```{r data-overview}
182
+
airway
183
+
```
184
+
185
+
Loading `tidySummarizedExperiment` makes the `SummarizedExperiment` objects compatible with tidyverse tools while maintaining its `SummarizedExperiment` nature. This is useful because it allows us to use the `tidyverse` tools to manipulate the data.
186
+
187
+
```{r check-se-class}
188
+
class(airway)
189
+
```
190
+
191
+
### Prepare Data for Analysis
192
+
193
+
Before analysis, we need to ensure our variables are in the correct format:
194
+
195
+
```{r convert-condition-to-factor}
196
+
# Convert dex to factor for proper differential expression analysis
197
+
airway = airway |>
198
+
mutate(dex = as.factor(dex))
199
+
```
200
+
201
+
## Step 1: Aggregate Duplicated Transcripts
202
+
203
+
tidybulk provides the `aggregate_duplicates` function to aggregate duplicated transcripts (e.g., isoforms, ensembl). For example, we often have to convert ensembl symbols to gene/transcript symbol, but in doing so we have to deal with duplicates. `aggregate_duplicates` takes a tibble and column names (as symbols; for `sample`, `transcript` and `count`) as arguments and returns a tibble with transcripts with the same name aggregated. All the rest of the columns are appended, and factors and boolean are appended as characters.
204
+
205
+
> Transcript aggregation is a standard bioinformatics approach for gene-level summarization.
We may want to compensate for sequencing depth, scaling the transcript abundance (e.g., with TMM algorithm, Robinson and Oshlack doi.org/10.1186/gb-2010-11-3-r25). `scale_abundance` takes a tibble, column names (as symbols; for `sample`, `transcript` and `count`) and a method as arguments and returns a tibble with additional columns with scaled data as `<NAME OF COUNT COLUMN>_scaled`.
159
235
236
+
> Normalization is crucial for comparing expression levels across samples with different library sizes.
We can easily plot the scaled density to check the scaling outcome. On the x axis we have the log scaled counts, on the y axes we have the density, data is grouped by sample and coloured by treatment.
@@ -194,9 +272,11 @@ airway.norm |>
194
272
my_theme
195
273
```
196
274
197
-
## Filter `variable transcripts`
275
+
## Step 3: Filter Variable Transcripts
276
+
277
+
We may want to identify and filter variable transcripts to focus on the most informative features.
198
278
199
-
We may want to identify and filter variable transcripts.
279
+
> Variable transcript filtering helps reduce noise and focuses analysis on the most informative features.
We may want to reduce the dimensions of our data, for example using PCA or MDS algorithms. `reduce_dimensions` takes a tibble, column names (as symbols; for `sample`, `transcript` and `count`) and a method (e.g., MDS or PCA) as arguments and returns a tibble with additional columns for the reduced dimensions.
236
316
317
+
> Dimensionality reduction helps visualize high-dimensional data and identify patterns.
318
+
237
319
**MDS** (Robinson et al., 10.1093/bioinformatics/btp616)
We may want to rotate the reduced dimensions (or any two numeric columns really) of our data, of a set angle. `rotate_dimensions` takes a tibble, column names (as symbols; for `sample`, `transcript` and `count`) and an angle as arguments and returns a tibble with additional columns for the rotated dimensions. The rotated dimensions will be added to the original data set as `<NAME OF DIMENSION> rotated <ANGLE>` by default, or as specified in the input arguments.
403
+
404
+
> Dimension rotation can help align data with biological axes of interest.
321
405
<div class="column-left">
322
406
TidyTranscriptomics
323
407
```{r rotate}
@@ -366,9 +450,11 @@ airway.norm.MDS.rotated |>
366
450
my_theme
367
451
```
368
452
369
-
## Test `differential abundance`
453
+
## Step 8: Test Differential Abundance
370
454
371
455
We may want to test for differential transcription between sample-wise factors of interest (e.g., with edgeR). `test_differential_expression` takes a tibble, column names (as symbols; for `sample`, `transcript` and `count`) and a formula representing the desired linear model as arguments and returns a tibble with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
456
+
457
+
> Differential expression analysis identifies genes that are significantly different between conditions.
372
458
<div class="column-left">
373
459
TidyTranscriptomics
374
460
```{r de, message=FALSE, warning=FALSE, results='hide'}
@@ -410,10 +496,12 @@ airway.de =
410
496
pivot_transcript()
411
497
```
412
498
413
-
## Adjust `counts`
499
+
## Step 6: Adjust for Unwanted Variation
414
500
415
501
We may want to adjust `counts` for (known) unwanted variation. `adjust_abundance` takes as arguments a tibble, column names (as symbols; for `sample`, `transcript` and `count`) and a formula representing the desired linear model where the first covariate is the factor of interest and the second covariate is the unwanted variation, and returns a tibble with additional columns for the adjusted counts as `<COUNT COLUMN>_adjusted`. At the moment just an unwanted covariates is allowed at a time.
416
502
503
+
> Batch effect correction is important for removing technical variation that could confound biological signals.
We may want to cluster our samples based on the transcriptomic profiles. `cluster_elements` takes as arguments a tibble, column names (as symbols; for `sample`, `transcript` and `count`) and returns a tibble with additional columns for the cluster labels.
514
602
603
+
> Clustering helps identify groups of samples with similar expression profiles.
## Step 9: Test Differential Abundance (Alternative Method)
539
629
540
630
We may want to test for differential abundance between conditions. `test_differential_abundance` takes as arguments a tibble, column names (as symbols; for `sample`, `transcript` and `count`) and a formula representing the desired linear model, and returns a tibble with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
541
631
632
+
> This demonstrates an alternative approach to differential expression analysis.
0 commit comments