Skip to main content

Installing tucca-rna-seq

tip

Before Installation

We recommend completing the Data Collection Template first to ensure you have all your experimental information organized. This will make the installation and configuration process much more efficient.

Prerequisites

Prior to installing this workflow, we highly recommend that you are familiar with:

  • Version Control
  • Git/GitHub
  • Best practices for reproducibility in scientific computing

System Requirements

The tucca-rna-seq workflow requires the following software and resources:

Software Dependencies

SoftwareVersionPurpose
Snakemake≥8.27.1Workflow management system
Conda/MambaLatestPackage and environment management
Singularity/ApptainerSingularity ≥3.8.4 or Apptainer v1.4.1 (potentially later versions, but has not been tested)Container runtime (recommended)
Version Compatibility

For the most up-to-date information on compatible software versions, please refer to the dependencies tested in our latest GitHub Actions workflows.


Installation Methods

HPC Cluster Installation

Most HPC clusters provide pre-installed software via modules, making this the easiest installation method.

1. Check Available Modules

# Check available modules
module avail snakemake
module avail singularity
module avail conda
module avail mamba
module avail miniforge
Administrator Privileges for Singularity/Apptainer

Please note that installing Singularity or Apptainer is a non-trivial process that requires sudo (administrator) privileges. If a module is not available on your HPC cluster, you will need assistance from your system administrator.

If Key Modules Are Missing

If your HPC cluster does not provide a module for Snakemake or a Conda distribution (like Miniforge or Mambaforge), you have two options:

  1. Contact your HPC administrator: They may be able to install the required software or help you create a custom module.
  2. Install it yourself: You can install a minimal Conda distribution like Mambaforge in your user space.

If you install it yourself, it is critical to configure Conda to use a shared storage location to avoid exceeding your home directory's storage quota. See the tip below for details.

HPC Best Practice: Configure Conda's Storage Paths

By default, Conda installs all software packages and environments into your home directory, which typically has a strict, small storage quota on an HPC cluster.

To avoid running out of space, you should create a .condarc file in your home directory to redirect Conda to use a larger, shared storage location (like a lab or project directory). This is a common best practice on most HPC systems.

Your .condarc file should look something like this:

~/.condarc
envs_dirs:
- /path/to/your/shared/storage/conda_envs/
pkgs_dirs:
- /path/to/your/shared/storage/conda_pkgs/

For a detailed, real-world example of this process, see the Tufts HPC guide on Configuring Conda. The principles in this guide can be adapted for most HPC clusters.

HPC Best Practice: Create Private Modules for Custom Software

After configuring Conda to use a shared storage location (as described in the tip above), the next step for managing your custom software is to create a private environment module. This makes your self-installed tools much easier to load and use consistently. Most HPC systems that use the Lmod module system support this feature.

This allows you to simply run module load your-custom-snakemake instead of manually activating a Conda environment or adding its location to your PATH every time.

For a detailed guide on how to create your own module files, see the Tufts HPC guide on Private Modules. The principles in this guide can be adapted for most HPC clusters.

2. Load Required Modules

# Purge all currently loaded modules
module purge

# Load required modules based on what versions are available
module load snakemake
module load singularity
module load miniforge

3. Verify Installation

# Check versions
snakemake --version
singularity --version
conda --version
Tufts HPC Users

For Tufts-specific installation instructions, see our Tufts HPC Quick Start Guide.


Workflow Setup

1. Create Project Directory

# Create and navigate to project directory
mkdir my_rnaseq_project
cd my_rnaseq_project

2. Clone the Repository

# Clone the workflow repository
git clone https://github.com/tucca-cellag/tucca-rna-seq.git

# Navigate into the workflow directory
cd tucca-rna-seq

Next Steps

Once installation is complete, you are ready to configure the workflow. Please refer to the Configuration Guide for detailed instructions on how to set up the config.yaml, samples.tsv, and units.tsv files for your analysis.