Installing tucca-rna-seq
Before Installation
We recommend completing the Data Collection Template first to ensure you have all your experimental information organized. This will make the installation and configuration process much more efficient.
System Requirements
The tucca-rna-seq workflow requires the following software and resources:
Software Dependencies
| Software | Version | Purpose | 
|---|---|---|
| Snakemake | ≥8.27.1 | Workflow management system | 
| Conda/Mamba | Latest | Package and environment management | 
| Singularity/Apptainer | Singularity ≥3.8.4 or Apptainer v1.4.1 (potentially later versions, but has not been tested) | Container runtime (recommended) | 
For the most up-to-date information on compatible software versions, please refer to the dependencies tested in our latest GitHub Actions workflows.
Installation Methods
- HPC Cluster (Recommended)
- Conda/Mamba (Local)
- Cloud & CI/CD
HPC Cluster Installation
Most HPC clusters provide pre-installed software via modules, making this the easiest installation method.
1. Check Available Modules
# Check available modules
module avail snakemake
module avail singularity
module avail conda
module avail mamba
module avail miniforge
Please note that installing Singularity or Apptainer is a non-trivial process
that requires sudo (administrator) privileges. If a module is not available
on your HPC cluster, you will need assistance from your system administrator.
If your HPC cluster does not provide a module for Snakemake or a Conda distribution (like Miniforge or Mambaforge), you have two options:
- Contact your HPC administrator: They may be able to install the required software or help you create a custom module.
- Install it yourself: You can install a minimal Conda distribution like Mambaforge in your user space.
If you install it yourself, it is critical to configure Conda to use a shared storage location to avoid exceeding your home directory's storage quota. See the tip below for details.
By default, Conda installs all software packages and environments into your home directory, which typically has a strict, small storage quota on an HPC cluster.
To avoid running out of space, you should create a .condarc file in your
home directory to redirect Conda to use a larger, shared storage location
(like a lab or project directory). This is a common best practice on most HPC
systems.
Your .condarc file should look something like this:
envs_dirs:
  - /path/to/your/shared/storage/conda_envs/
pkgs_dirs:
  - /path/to/your/shared/storage/conda_pkgs/
For a detailed, real-world example of this process, see the Tufts HPC guide on Configuring Conda. The principles in this guide can be adapted for most HPC clusters.
After configuring Conda to use a shared storage location (as described in the tip above), the next step for managing your custom software is to create a private environment module. This makes your self-installed tools much easier to load and use consistently. Most HPC systems that use the Lmod module system support this feature.
This allows you to simply run module load your-custom-snakemake instead of
manually activating a Conda environment or adding its location to your PATH
every time.
For a detailed guide on how to create your own module files, see the Tufts HPC guide on Private Modules. The principles in this guide can be adapted for most HPC clusters.
2. Load Required Modules
# Purge all currently loaded modules
module purge
# Load required modules based on what versions are available
module load snakemake
module load singularity
module load miniforge
3. Verify Installation
# Check versions
snakemake --version
singularity --version
conda --version
For Tufts-specific installation instructions, see our Tufts HPC Quick Start Guide.
Local Installation with Conda/Mamba
For local development and testing, you can install the workflow on your personal machine.
1. Install Conda or Mamba
2. Install Snakemake
3. Install Singularity/Apptainer
Please note that installing Singularity or Apptainer is a non-trivial process
that requires sudo (administrator) privileges. If these tools are not
already installed on your system or HPC cluster, you will likely need assistance
from your system administrator.
Local execution is suitable for small genomes and limited datasets. For production analyses, use an HPC cluster or cloud environment.
Cloud and Container Orchestration
For running the workflow on cloud platforms like Google Cloud Batch, AWS Batch, or Kubernetes, you will need to install and configure their respective command-line interface (CLI) tools.
GitHub Actions (for CI/CD)
This repository includes a pre-configured GitHub Actions workflow for continuous integration (CI) and testing. This setup uses free, public runners and is intended for verifying the workflow's integrity, not for production analysis.
To use it for testing, simply fork this repository and enable GitHub Actions in
your fork's settings. You will also need to configure any required secrets
(e.g., an NCBI API key) in your repository settings under Settings > Secrets and variables > Actions.
The included workflow file (.github/workflows/main.yml) can be repurposed to
run full-scale analyses by using self-hosted runners with sufficient
computational resources. This involves changing the runs-on key in the
workflow file to target your self-hosted runner group.
Workflow Setup
1. Create Project Directory
# Create and navigate to project directory
mkdir my_rnaseq_project
cd my_rnaseq_project
2. Clone the Repository
# Clone the workflow repository
git clone https://github.com/tucca-cellag/tucca-rna-seq.git
# Navigate into the workflow directory
cd tucca-rna-seq
Next Steps
Once installation is complete, you are ready to configure the workflow. Please
refer to the Configuration Guide for detailed instructions on
how to set up the config.yaml, samples.tsv, and units.tsv files for your
analysis.