-
Notifications
You must be signed in to change notification settings - Fork 1
Example run
This page will walk you through an example pan-genome construction run. This can be useful both to make sure that Panoramic was installed properly and as an example for obtaining data, configuring, and running Panoramic.
Start by activating the Panoramic Snakemake conda environment:
$ conda activate snakemake-panoramic
then go to the Panoramic test directory:
$ cd test
Next, download and prepare the relevant reference and HQ samples data. A script was prepared to do that automatically:
$ cd data
$ ./get_test_data.sh
The next step is to configure the pipeline. The example shown here is for running the map-to-pan pipeline, but examples for de-novo and iterative assembly are also available.
$ cd ../map_to_pan/
The LQ samples configuration (LQ_samples_info_map_to_pan.tsv
) is ready for use - no need to change anything. Use any text editor to modify HQ_samples_info_map_to_pan.tsv
so that it contains full paths to the test data in all fields. Next, edit the main configuration conf_test_map_to_pan.yml
. Specifically, the following parameters need to be modified into full paths:
samples_info_file
hq_genomes_info_file
out_dir
reference_genome
reference_annotation
reference_proteins
repeats_library
transcripts
proteins
annotation_yml_template
You must also set a value for the ppn
parameter. If you are running on an HPC cluster, this is the maximum number of CPUs to be used on a single node (for multiprocess steps). If you are running locally, this is the maximum number of CPUs to be used on your local machine.
In addition, if you plan to use a HPC cluster, you also need to set appropriate values to the parameters in the Environment
section - see details here. An annotation configuration template file is available at test/EVM_annotation_template.yml
. No need to modify it, just include a full path to it in the main configuration.
We are now ready to run the pipeline. If you want to run locally:
$ snakefile="../../map_to_pan/PGC_map_to_pan.snakefile"
# try a dry run first
$ snakemake -s $snakefile --configfile conf_test_map_to_pan.yml --use-conda -p -j <# of threads> -n
# If no errors occur, start the run:
$ snakemake -s $snakefile --configfile conf_test_map_to_pan.yml --use-conda -p -j <# of threads> >out 2>err &
If you want to run on a HPC cluster, you should have a qsub wrapper script ready, as explained here.
snakefile="../../map_to_pan/PGC_map_to_pan.snakefile"
qsub_script="../../util/<qsub_snakemake_wrapper.py>"
job_script="../../util/jobscript.sh"
# try a dry run first
snakemake -s $snakefile --configfile conf_test_map_to_pan.yml --cluster "python $qsub_script" --latency-wait 60 --use-conda -p -j <# jobs> --jobscript "$job_script" -n
# If no errors occur, start the run:
snakemake -s $snakefile --configfile conf_test_map_to_pan.yml --cluster "python $qsub_script" --latency-wait 60 --use-conda -p -j <# jobs> --jobscript "$job_script" >out 2>err &
Downloading packages and data, as well as performing the analysis takes some time, so you'll probably need to let the pipeline run overnight (the entire process took ~12 hours using 100 jobs on an HPC cluster). Once complete, you can explore the outputs described here.