Running the tucca-rna-seq Workflow
This guide covers how to execute the tucca-rna-seq workflow using different
execution environments and profiles.
Quick Start
A dry-run (-np) will fail on the first run if the required software
environments do not yet exist. This is a known issue in Snakemake where it
attempts to inspect Conda environments inside containers before pulling the
container image (#1901,
#3038).
To prevent this, you must first build all Conda environments and pull all
container images. The --conda-create-envs-only flag conveniently handles
both of these tasks at once.
Once you've configured your analysis, follow these steps to run the workflow:
# 1. Validate your configuration
snakemake --lint --workflow-profile profiles/slurm
# 2. Create all required Conda environments
snakemake all --conda-create-envs-only --workflow-profile profiles/slurm
# 3. Test the workflow (dry-run)
snakemake all -np --workflow-profile profiles/slurm
# 4. Execute the workflow
snakemake all --workflow-profile profiles/slurm
When you run Snakemake interactively on a remote machine (like an HPC compute
node), the main snakemake process acts as the workflow's orchestrator. If your
SSH connection is interrupted (e.g., your computer sleeps or your Wi-Fi
disconnects), this main process will be terminated, and your workflow will fail.
To prevent this, it is highly recommended to run the main execution command
inside a terminal multiplexer like tmux or screen. These tools create a
persistent session on the remote machine that will keep your workflow running
even if your local connection is lost.
If you cannot use a multiplexer, you must ensure your local machine does not go to sleep. For macOS, we recommend the free utility Amphetamine to keep your Mac awake for a specified duration.
Using Execution Profiles
Snakemake profiles are a powerful feature that allows you to pre-configure the command-line options for a specific computing environment, such as a local workstation or an HPC cluster. This saves you from typing long, complex commands for every analysis.
It's important to note that any command-line option available in Snakemake can be set within a profile, making them incredibly versatile. For a full list of available options, see the official Snakemake CLI documentation.
To use a profile, you simply activate it with the --workflow-profile flag:
# Use the Slurm profile
snakemake all --workflow-profile profiles/slurm
# Use the development profile
snakemake all --workflow-profile profiles/slurm-dev
For a comprehensive guide on what profiles are, which ones are available in this workflow, and how to customize them for your specific needs, please see the Configuration Guide.
Workflow Execution Steps
1. Configuration Validation
Before running, validate your configuration:
# Check for configuration errors
snakemake --lint --workflow-profile profiles/slurm
This step:
- Validates your config.yamlagainst the JSON schema
- Checks consistency between samples.tsvandunits.tsv
- Identifies potential issues before execution
2. Create Software Environments
Before performing a dry-run, it's essential to create the necessary software environments. This step prevents errors during the dry-run, which needs to inspect tools inside containers that may not have been downloaded yet.
# Create all Conda environments and pull container images
snakemake all --conda-create-envs-only --workflow-profile profiles/slurm
This command will:
- Download the main Singularity/Apptainer container image.
- Create all the isolated Conda environments required by the workflow's rules.
- Not run any computational jobs.
3. Dry Run
Test the workflow without executing jobs:
# Generate execution plan
snakemake all -np --workflow-profile profiles/slurm
This step:
- Shows which jobs will be executed
- Displays the dependency graph
- Estimates resource requirements
- Identifies any missing inputs or configuration issues
4. Execution
Run the complete workflow:
# Execute all jobs
snakemake all --workflow-profile profiles/slurm
The workflow will:
- Use the pre-built software environments for each tool
- Download reference genomes and annotations
- Process your sequencing data through the pipeline
- Generate comprehensive analysis results
Monitoring and Debugging
Job Monitoring
When running Snakemake interactively, the main log output will stream directly to your terminal.
For workflows running on an HPC cluster, you can use the scheduler's native commands in a separate terminal to check the status of submitted jobs. The command below is an example for a SLURM-managed cluster.
# Check job status (example for a SLURM cluster)
squeue -u $USER
Understanding Log Files
When troubleshooting, it's important to know where to look for information. The workflow generates logs in three primary locations, each serving a different purpose:
- 
Main Snakemake Log ( .snakemake/log/): This directory contains the main log file from the Snakemake process itself. It's useful for debugging high-level workflow errors related to DAG construction, configuration, or job submission.
- 
Cluster Executor Logs (e.g., .snakemake/slurm_logs/): When running on a cluster, the executor plugin (like the one for SLURM) generates its own logs for each submitted job. These files capture the rawstdoutandstderrfrom the cluster's perspective and are invaluable for debugging job submission issues or resource-related failures.
- 
Rule-Specific Logs ( logs/): The workflow is designed to capture thestdoutandstderrfrom each specific rule into this directory. These are the most important logs for debugging tool-specific errors, such as a problem with a bioinformatics tool's parameters or input files.
Failed Jobs
When jobs fail, Snakemake provides commands to help you investigate and recover. While these can be run manually, it is often more convenient to set them as defaults in your execution profile.
The pre-configured slurm profile for this workflow already includes the
options show-failed-logs: true and rerun-incomplete: true, automating
these recovery steps for you. See the
Configuration Guide
for details.
Common Issues
| Issue | Solution | 
|---|---|
| Missing dependencies | Check conda environment creation | 
| Resource limits | Adjust memory/CPU allocation in profile | 
| File permissions | Ensure write access to output directories | 
| Network issues | Check internet connectivity for downloads | 
Resource Management
Memory and CPU Allocation
Configure resource allocation in your profile:
# profiles/slurm/config.v8+.yaml
default-resources:
  mem_mb: 32000      # 32GB RAM per job
  cpus_per_task: 12  # 12 CPU cores per job
  runtime: 4320      # 4 hours runtime
# Override for specific rules
set-resources:
  star_index:
    mem_mb: 64000    # 64GB RAM for genome indexing
Software Environment Caching
Creating the workflow's software environments with Conda and downloading the
Singularity/Apptainer container image can be time-consuming, but this initial
setup only needs to be done once. By default, Snakemake caches these
environments in a hidden .snakemake directory within each project folder.
To avoid rebuilding these environments for every new analysis, you can create a centralized cache that all your projects can share. This has two major benefits:
- Saves Time: Subsequent workflow runs will start much faster by reusing the pre-existing environments.
- Saves Space: It prevents duplicating many gigabytes of software, which is especially important on HPC clusters with home/lab directory storage quotas.
To set up a central cache, use the --conda-prefix and --singularity-prefix
flags to point to a shared location, such as a project or scratch directory.
# Example of redirecting caches to a shared location
snakemake all \\
  --workflow-profile profiles/slurm \\
  --conda-prefix /path/to/shared/conda_envs \\
  --singularity-prefix /path/to/shared/singularity_images
For convenience, it is best practice to set these paths as default options within your execution profile.
Alternatively, you can set the SNAKEMAKE_CONDA_PREFIX and
SNAKEMAKE_APPTAINER_PREFIX environment variables in your shell's startup
file (e.g., ~/.bashrc) for a more permanent solution.
Storage Requirements
Runtime Estimation
Advanced Execution Options
Parallel Execution
Control job parallelism:
# Limit concurrent jobs
snakemake all --workflow-profile profiles/slurm --jobs 50
# Use all available cores locally
snakemake all --use-conda --cores all
Selective Execution
Run specific parts of the workflow:
# Run only quality control
snakemake fastqc --workflow-profile profiles/slurm
# Run only differential expression
snakemake deseq2_wald_per_analysis --workflow-profile profiles/slurm
# Run only enrichment analysis
snakemake all_enrichment_analyses --workflow-profile profiles/slurm
Resume and Restart
Handle interrupted workflows:
# Resume from where you left off
snakemake all --workflow-profile profiles/slurm
# Force rerun of specific outputs
snakemake all --workflow-profile profiles/slurm --forceall
# Remove all protected and temporary output files
snakemake all --delete-all-output --workflow-profile profiles/slurm
It is highly recommended to perform a dry-run (-np) before deleting all
outputs to see which files will be removed.
snakemake all --delete-all-output -np --workflow-profile profiles/slurm
Output and Results
Main Outputs
The workflow generates comprehensive results:
- Quality Control: FastQC reports, Qualimap analysis
- Alignment: STAR BAM files, alignment statistics
- Quantification: Salmon transcript counts
- Differential Expression: DESeq2 results, normalized counts
- Enrichment Analysis: GO, KEGG, MSigDB, SPIA results
- Reports: MultiQC summary, FastQC, Qualimap HTML reports
Output Organization
Results are organized in a logical structure:
results/
├── fastqc/           # Quality control reports
├── qualimap/         # RNA-seq quality metrics  
├── salmon/           # Transcript quantification
└── multiqc/          # Summary reports
resources/
├── star/             # STAR alignment results (BAM files)
├── deseq2/           # Differential expression results
├── enrichment/       # Functional enrichment analysis
└── tximeta/          # Transcript quantification metadata
Post-Analysis Interactive Tools
After workflow completion, you can use the provided RMarkdown notebooks for additional analysis:
- analysis/pcaExplorer_playground.Rmd: Principal component analysis
- analysis/GeneTonic_playground.Rmd: Enrichment analysis visualization
Note: These are separate analysis tools, not part of the Snakemake workflow execution.
Troubleshooting
Profile Issues
If profiles don't work as expected:
- Check Snakemake version: Ensure you're using v8.27.1 or later
- Verify profile syntax: Check YAML formatting and indentation
- Install executor plugins: Ensure required plugins are installed
- Check permissions: Verify file and directory permissions
Execution Problems
Common execution issues and solutions:
| Problem | Cause | Solution | 
|---|---|---|
| Jobs stuck in queue | Resource limits too high | Reduce memory/CPU requirements | 
| "Lost" SLURM jobs | sacctaccounting issues. You may see messages like:status_of_jobs after sacct is: {}active_jobs_ids_with_current_sacct_status are: {}active_jobs_seen_by_sacct are: {}missing_sacct_status are: set() | Stop and restart the workflow; Snakemake will re-evaluate job status and resume. | 
| Environment creation fails | Conda/Mamba issues | Check conda installation and channels | 
| Container errors | Singularity/Apptainer issues | Verify container runtime installation | 
| File system errors | Storage or permission issues | Check disk space and file permissions | 
Getting Help
If you encounter issues:
- Check the logs: Review Snakemake and job-specific logs
- Validate configuration: Run snakemake --lint
- Review documentation: Check this guide and workflow documentation
- Open an issue: Report problems on the GitHub repository
Next Steps
After successfully running your workflow:
- Review Results: Examine quality control reports and analysis outputs
- Post-Analysis Tools: Use the provided RMarkdown notebooks
- Custom Analysis: Extend the workflow with your own scripts
- Share Results: Export data files and create custom visualizations
For more advanced usage and customization, see the Advanced Configuration guide.