Skip to content
TUCCA Our TeamHelpCAAIL ↗

Running the tucca-rna-seq Workflow

This guide covers how to execute the tucca-rna-seq workflow using different execution environments and profiles.


Once you’ve configured your analysis, follow these steps to run the workflow:

Terminal window
# 1. Validate your configuration
snakemake --lint --workflow-profile profiles/slurm
# 2. Create all required Conda environments
snakemake all --conda-create-envs-only --workflow-profile profiles/slurm
# 3. Test the workflow (dry-run)
snakemake all -np --workflow-profile profiles/slurm
# 4. Execute the workflow
snakemake all --workflow-profile profiles/slurm

Snakemake profiles are a powerful feature that allows you to pre-configure the command-line options for a specific computing environment, such as a local workstation or an HPC cluster. This saves you from typing long, complex commands for every analysis.

To use a profile, you simply activate it with the --workflow-profile flag:

Terminal window
# Use the Slurm profile
snakemake all --workflow-profile profiles/slurm
# Use the development profile
snakemake all --workflow-profile profiles/slurm-dev

For a comprehensive guide on what profiles are, which ones are available in this workflow, and how to customize them for your specific needs, please see the Configuration Guide.


Before running, validate your configuration:

Terminal window
# Check for configuration errors
snakemake --lint --workflow-profile profiles/slurm

This step:

  • Validates your config.yaml against the JSON schema
  • Checks consistency between samples.tsv and units.tsv
  • Identifies potential issues before execution

Before performing a dry-run, it’s essential to create the necessary software environments. This step prevents errors during the dry-run, which needs to inspect tools inside containers that may not have been downloaded yet.

Terminal window
# Create all Conda environments and pull container images
snakemake all --conda-create-envs-only --workflow-profile profiles/slurm

This command will:

  • Download the main Singularity/Apptainer container image.
  • Create all the isolated Conda environments required by the workflow’s rules.
  • Not run any computational jobs.

Test the workflow without executing jobs:

Terminal window
# Generate execution plan
snakemake all -np --workflow-profile profiles/slurm

This step:

  • Shows which jobs will be executed
  • Displays the dependency graph
  • Estimates resource requirements
  • Identifies any missing inputs or configuration issues

Run the complete workflow:

Terminal window
# Execute all jobs
snakemake all --workflow-profile profiles/slurm

The workflow will:

  • Use the pre-built software environments for each tool
  • Download reference genomes and annotations
  • Process your sequencing data through the pipeline
  • Generate comprehensive analysis results

When running Snakemake interactively, the main log output will stream directly to your terminal.

For workflows running on an HPC cluster, you can use the scheduler’s native commands in a separate terminal to check the status of submitted jobs. The command below is an example for a SLURM-managed cluster.

Terminal window
# Check job status (example for a SLURM cluster)
squeue -u $USER

When troubleshooting, it’s important to know where to look for information. The workflow generates logs in three primary locations, each serving a different purpose:

  • Main Snakemake Log (.snakemake/log/): This directory contains the main log file from the Snakemake process itself. It’s useful for debugging high-level workflow errors related to DAG construction, configuration, or job submission.

  • Cluster Executor Logs (e.g., .snakemake/slurm_logs/): When running on a cluster, the executor plugin (like the one for SLURM) generates its own logs for each submitted job. These files capture the raw stdout and stderr from the cluster’s perspective and are invaluable for debugging job submission issues or resource-related failures.

  • Rule-Specific Logs (logs/): The workflow is designed to capture the stdout and stderr from each specific rule into this directory. These are the most important logs for debugging tool-specific errors, such as a problem with a bioinformatics tool’s parameters or input files.

When jobs fail, Snakemake provides commands to help you investigate and recover. While these can be run manually, it is often more convenient to set them as defaults in your execution profile.

| Issue | Solution | |-------|----------| | Missing dependencies | Check conda environment creation | | Resource limits | Adjust memory/CPU allocation in profile | | File permissions | Ensure write access to output directories | | Network issues | Check internet connectivity for downloads |


Configure resource allocation in your profile:

profiles/slurm/config.v8+.yaml
default-resources:
mem_mb: 32000 # 32GB RAM per job
cpus_per_task: 12 # 12 CPU cores per job
runtime: 4320 # 4 hours runtime
# Override for specific rules
set-resources:
star_index:
mem_mb: 64000 # 64GB RAM for genome indexing

Creating the workflow’s software environments with Conda and downloading the Singularity/Apptainer container image can be time-consuming, but this initial setup only needs to be done once. By default, Snakemake caches these environments in a hidden .snakemake directory within each project folder.

To avoid rebuilding these environments for every new analysis, you can create a centralized cache that all your projects can share. This has two major benefits:

  1. Saves Time: Subsequent workflow runs will start much faster by reusing the pre-existing environments.
  2. Saves Space: It prevents duplicating many gigabytes of software, which is especially important on HPC clusters with home/lab directory storage quotas.

To set up a central cache, use the --conda-prefix and --singularity-prefix flags to point to a shared location, such as a project or scratch directory.

Terminal window
# Example of redirecting caches to a shared location
snakemake all \\
--workflow-profile profiles/slurm \\
--conda-prefix /path/to/shared/conda_envs \\
--singularity-prefix /path/to/shared/singularity_images

Control job parallelism:

Terminal window
# Limit concurrent jobs
snakemake all --workflow-profile profiles/slurm --jobs 50
# Use all available cores locally
snakemake all --use-conda --cores all

Run specific parts of the workflow:

Terminal window
# Run only quality control
snakemake fastqc --workflow-profile profiles/slurm
# Run only differential expression
snakemake deseq2_wald_per_analysis --workflow-profile profiles/slurm
# Run only enrichment analysis
snakemake all_enrichment_analyses --workflow-profile profiles/slurm

Handle interrupted workflows:

Terminal window
# Resume from where you left off
snakemake all --workflow-profile profiles/slurm
# Force rerun of specific outputs
snakemake all --workflow-profile profiles/slurm --forceall
# Remove all protected and temporary output files
snakemake all --delete-all-output --workflow-profile profiles/slurm

The workflow generates comprehensive results:

  • Quality Control: FastQC reports, Qualimap analysis
  • Alignment: STAR BAM files, alignment statistics
  • Quantification: Salmon transcript counts
  • Differential Expression: DESeq2 results, normalized counts
  • Enrichment Analysis: GO, KEGG, MSigDB, SPIA results
  • Reports: MultiQC summary, FastQC, Qualimap HTML reports

Results are organized in a logical structure:

results/
├── fastqc/ # Quality control reports
├── qualimap/ # RNA-seq quality metrics
├── salmon/ # Transcript quantification
└── multiqc/ # Summary reports
resources/
├── star/ # STAR alignment results (BAM files)
├── deseq2/ # Differential expression results
├── enrichment/ # Functional enrichment analysis
└── tximeta/ # Transcript quantification metadata

After workflow completion, you can use the provided RMarkdown notebooks for additional analysis:

  • analysis/pcaExplorer_playground.Rmd: Principal component analysis
  • analysis/GeneTonic_playground.Rmd: Enrichment analysis visualization

Note: These are separate analysis tools, not part of the Snakemake workflow execution.


If profiles don’t work as expected:

  1. Check Snakemake version: Ensure you’re using v8.27.1 or later
  2. Verify profile syntax: Check YAML formatting and indentation
  3. Install executor plugins: Ensure required plugins are installed
  4. Check permissions: Verify file and directory permissions

Common execution issues and solutions:

| Problem | Cause | Solution | |---------|-------|----------| | Jobs stuck in queue | Resource limits too high | Reduce memory/CPU requirements | | “Lost” SLURM jobs | sacct accounting issues. You may see messages like:

status_of_jobs after sacct is: {}
active_jobs_ids_with_current_sacct_status are: {}
active_jobs_seen_by_sacct are: {}
missing_sacct_status are: set() | Stop and restart the workflow; Snakemake will re-evaluate job status and resume. | | Environment creation fails | Conda/Mamba issues | Check conda installation and channels | | Container errors | Singularity/Apptainer issues | Verify container runtime installation | | File system errors | Storage or permission issues | Check disk space and file permissions |

If you encounter issues:

  1. Check the logs: Review Snakemake and job-specific logs
  2. Validate configuration: Run snakemake --lint
  3. Review documentation: Check this guide and workflow documentation
  4. Open an issue: Report problems on the GitHub repository
View the Official Snakemake FAQ

After successfully running your workflow:

  1. Review Results: Examine quality control reports and analysis outputs
  2. Post-Analysis Tools: Use the provided RMarkdown notebooks
  3. Custom Analysis: Extend the workflow with your own scripts
  4. Share Results: Export data files and create custom visualizations

For more advanced usage and customization, see the Advanced Configuration guide.

Linked external resources are independent of TUCCA and Tufts University and remain under their own licenses.