diff --git a/README.md b/README.md index eb3ffcd26cd5440d778984c6e95574a0a9258c2d..41c35be035e7bb260bd514ddf989d4f64b994523 100644 --- a/README.md +++ b/README.md @@ -10,17 +10,28 @@ $ git clone git@forgemia.inra.fr:umr-gdec/magatt.git ## Dependancies -### Snakemake -* Version 5.5.2 +### Build magatt environment with conda -### Python3 +We recommend to build the environment using conda (developped with miniconda 3, conda 4.9.2 ) with the file [environment.yml](environment.yml): + +```console +$ conda env create -f=environment.yml -n magatt +``` + +Once created, you can activate the environment with: + +```console +$ conda activate magatt +``` + +All the dependancies are listed below. + +* Snakemake : 5.5.2 * Python: 3.5 * Biopython: 1.68 * numpy: 1.15 * pandas: 0.23 * pysam: 0.15 - -### Genomic Tools * Bedtools: 2.27 * Blat: 36 * Exonerate (fastavalidcds): 2.4.0 @@ -30,9 +41,9 @@ $ git clone git@forgemia.inra.fr:umr-gdec/magatt.git * NCBI-blast (BLAST+): 2.6 * Samtools: 1.9 -## Prepare the pipeline +## Prepare and run the pipeline -### Creating the configuration file +### Creating the configuration file: inputs and other parameters The configuration file [config.yaml](config.yaml) will contain all the input files required for the pipeline and some other parameters. @@ -170,6 +181,11 @@ $ snakemake -j 32 --cluster sbatch This will allow to have at most 32 subproccess run through the SLURM scheduler with `sbatch`. +You can use a custom [cluster.json](cluster.json) JSON file do setup the parameters of SBATCH for each rules, and use it with with: +```console +$ snakemake -j 32 -u cluster.json --cluster "sbatch -J {cluster.jobName} -c {cluster.c} --mem {cluster.mem} -e {cluster.error} -o {cluster.output} -p debug" --verbose" +``` + You can generate the diagram of all the processes and dependancies of you analysis: ```bash $ snakemake --dag |dot -T png > dag.png