A Nextflow pipeline for processing RNASeq sequencing data.
The pipeline was written by The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
- Raw read QC (
FastQC) - Adapter trimming (
cutadapt) - Alignment and quantification (
RSEM,STAR) - Sorting and indexing (
SAMtools) - Quality control metrics:
picard:- Groups (
AddOrReplaceReadGroups) - Duplicates (
MarkDuplicates) - Library complexity (
EstimateLibraryComplexity) - Various metrics (
CollectRnaSeqMetrics,CollectMultipleMetrics)
- Groups (
RSeQC:- Samples quality (
infer_experiment.py,read_distribution.py,tin.py) - Alternative splicing (
junction_annotation.py,junction_saturation.py) - Mismatch (
mismatch_profile.py)
- Samples quality (
RNA-SeQC
- Preparation for statistical analysis:
- Create a count matrix (
R,SummarizedExperiment) - Perform a principal component analysis (
R,DESeq2)
- Create a count matrix (
- Collect and present a report (
MultiQC)
The documentation for the pipeline can be found in the docs/ directory:
- Installation
- Design file
- Pipeline configuration
- Running the pipeline
- Output and interpretation of results
- Troubleshooting
The pipeline was written by the The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
The pipeline was developed by Gavin Kelly, Harshil Patel, Nourdine Bah and Philip East.
This project is licensed under the MIT License - see the LICENSE.md file for details.

