This is a pipeline used to run the MetaboXcan reports where we look at the association of metabolites and genes to complex traits.
Once you git clone the repository you need to download additional data used to generate the reports. Donwload all the files from this box folder into the data folder.
cd metaboxcan
mkdir data
cd data # download the additional data inside this directory
The GWAS summary statistics should be preprocessed using the summary-gwas-imputation tool. The tutorial for harmonizing the sumstats is available here ensuring the genome build is hg38.
At minimum the following columns should be present:
- variant_id (rsid for each variant)
- effect_allele
- non_effect_allele
- zscore
Harmonization allows for easy matching of variant ids and running the workflow.
To adjust for inflation you need the trait heritability which can be estimated using the LDSC software as described here but a short example looks like;
ldsc.py \
--h2 scz.sumstats.gz \
--ref-ld-chr eur_w_ld_chr/ \
--w-ld-chr eur_w_ld_chr/ \
--out scz_h2
Follow the LDSC tutorial to set up the environment and estimate the heritability.
You can run the workflow with or without adjusting for inflation using the variance control method. If you run the pipeline to adjust for inflation you need to use ensure you use the most recent models here which include the phi values. You also need the GWAS sample size(N) and Heritability (
- Run MetaboXcan with adjusting for inflation
nextflow run main.nf \
--keepIntermediate -resume \
--gene_models_folder '/path/to/gene/models/en_*.{db,txt.gz}' \
--metabolite_models_folder '/path/to/metabolite/models/metsim-invnorm-softimpute.{db,txt.gz}' \
--gwas_file '/path/to/gwas_file.txt.gz'
--gwas_N 150000 \
--gwas_h2 0.245 \
--outdir /path/to/output/directory/
- Run MetaboXcan without adjusting for inflation
nextflow run main.nf \
--keepIntermediate -resume \
--gene_models_folder '/path/to/gene/models/en_*.{db,txt.gz}' \
--metabolite_models_folder '/path/to/metabolite/models/metsim-invnorm-softimpute.{db,txt.gz}' \
--gwas_file '/path/to/gwas_file.txt.gz'
--outdir /path/to/output/directory/