How to setup a bioinformatics analysis workflow in 2024 with VSCode, Quarto, and Mamba
Introduction
The bioinformatics and computational biology discipline has seen dramatic growth in the last decade and is projected to continue growing.1 As the field continues to evolve, the software and computational tools do as well, so having a reliable and stable workflow is essential.
Everyone has a prefered coding workflow, and no one setup is perfect. That being said, I have found that the combination of Visual Studio Code (VSCode) for IDE, Quarto for creating notebooks, and Mamba for package management is a dynamic and powerful bioinformatics workflow in 2024!
Quick Start
Attention Windows Users: This tutorial is tailored towards MacOS and Linux OS. For dedicated Windows users, I recommend this blog post by Larysha Rothmann.
Install VSCode and relevant extensions:
- R
- Python
- Quarto
Install Quarto for creating rich text documents.
Download miniforge for creating stable and reproducible coding environments.
# Run the following commands in your VSCode terminal. press: (ctrl + `)
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
- Clone the bioinfo-2024 repository.
git clone https://github.com/dombraccia/bioinfo-2024.git
cd bioinfo-2024
- Create a stable coding environment with mamba.
# in the bioinfo-2024/ directory
mamba env create -n bioinfo python=3.12 -f environment.yml
- Render the example Quarto notebook to test your environment setup.
quarto render bioinfo-2024.qmd --to html
BONUS: Convert your Quarto notebook to a Jupyter notebook.
quarto convert bioinfo-2024.qmd --to ipynb
Instructions
1. Install Visual Studio Code
I use VSCode for almost all of my coding tasks. With extensions for Git, Jupyter, Docker, Remote-SSH, and linters for R, Python, Julia, and almost any coding language you can imagine, it is an extremely versatile IDE.2
After you have installed VSCode, install extensions for:
- R
- Python
- Quarto
2. Install Quarto
Quarto is a document authoring tool that allows you to create rich text documents with code, output, and narrative text. It is a great tool for mixing code, explainations, and any interesting results you find. I mostly choose Quarto over Jupyter notebooks3 for the following reasons:
- Language agnostic: Quarto allows you to code in multiple languages in the same document. This is highly relevant in bioinformatics where you may need to switch between R and Python regularly for access to different packages.
- Version control: Quarto documents are plain text files, making them easy to version control with Git.
- Customizable output: You can output your document in multiple formats (HTML, PDF, Word, etc.) with a single command.
- Conversion to Jupyter notebooks: Quarto documents can easily be converted to Jupyter notebooks and vice versa, so you are never locked-in to one notebook system.
3. Download Miniforge (mamba
)
Miniforge is a package manager based on Conda. It allows you to create isolated coding environments with specific package versions required for your project. I prefer miniforge (mamba
) over Anaconda or Miniconda because it is faster and more efficient. And if you’re already familiar with conda environments, just replace conda
with mamba
in your commands and you are good to go!
To install Miniforge, run the following commands in your VSCode terminal:
# Run the following commands in your VSCode terminal. press: (ctrl + `)
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
Alternatively, you can use wget
to download the Miniforge installer:
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
We will return to this part after step 4, cloning a git repository.
4. git clone
a repository
We will now work out of a repo set up with a minimal example of a bioinformatics analysis project. This next part required that you have Git downloaded, so if you don’t have it yet, you can download it here.
# Run the following commands in your VSCode terminal. press: (ctrl + `)
git clone https://github.com/dombraccia/bioinfo-2024.git
cd bioinfo-2024
Now, we can get back to creating a mamba
environment.
5. Create a mamba
environment
Once you have Miniforge installed, and you are in the bioinfo-2024/
folder, we can create a new mamba environment. The following code will:
- create a new mamba environment named
bioinfo-2024
- selects python version 3.12
- installs the packages listed in the
environment.yml
file - activates the
bioinfo-2024
environment
# run these commands in the VSCode terminal
mamba create -n bioinfo python=3.12 -f environment.yml
mamba activate bioinfo
Oops! We forgot to add the numpy
package to the environment.yml
file. Let’s add it now:
# make sure you have the bioinfo environment activated
mamba install numpy
You can always add packages to an active mamba environment with mamba install
.
Learn more about customizing yml files here.
6. Render a Quarto document
To test your environment setup, render the bioinfo-2024.qmd
notebook:
# in the bioinfo-2024/ directory
quarto render bioinfo-2024.qmd --to html
This command will render the Quarto document to HTML by default, but it also supports PDF, Word, and other formats.
But by far my favorite feature of the quarto cli, is the quarto convert
command:
# Convert qmd <==> ipynb by default, or specify the output format (pdf, html, ...)
quarto convert bioinfo-2024.qmd --to ipynb
quarto convert bioinfo-2024.ipynb --to qmd
quarto convert
automatically converts qmd <==> ipynb. I use this feature quite often when collaborating with others who prefer Jupyter notebooks.
Conclusion
Hopefully this tutorial helped set up a local bioinformatics workflow with VSCode, Quarto, and Mamba. If you have any questions or notice any errors, do reachout to me at domenick [at] dbraccia [dot] com. Happy coding!
Footnotes
Bioinformatics Market Size, Share & Trends Analysis Report↩︎
Alternatives to VSCode include: Rstudio, Jupyter Lab, and PyCharm. Ultimately, your choice of IDE essentially comes down to personal preference, so don’t spend too much time thinking about it.↩︎
The one instance in which I still use Jupyter notebooks/Jupyter Lab over Quarto is when I need my notebook environment to stay running for a long time. This can happen when processing large single cell datasets and running compute-heavy ML models that can sometimes take days to train.↩︎