Custom workflows
================
.. note::
We're constantly adding examples to this page, so please check back soon for more.
Or, if you have a request or a workflow you'd like to share, please either open an
issue or suggest an edit to this page by clicking the GitHub link at the top.
Default figure generation
-------------------------
.. raw:: html
By default, the workflow defined in the ``Snakefile`` looks like this:
.. code-block:: python
# User config
configfile: "showyourwork.yml"
# Import the showyourwork module
module showyourwork:
snakefile:
"showyourwork/workflow/Snakefile"
config:
config
# Use all default rules
use rule * from showyourwork
The default behavior in this workflow is to infer figure dependencies based on
the figure labels in the tex file.
The following block in ``ms.tex``
.. code-block:: latex
\begin{figure}
\begin{centering}
\includegraphics{figures/mandelbrot.pdf}
\caption{The Mandelbrot set.}
% This label tells showyourwork that the script `figures/mandelbrot.py'
% generates the PDF file included above
\label{fig:mandelbrot}
\end{centering}
\end{figure}
tells ``showyourwork`` to execute a script called ``mandelbrot.py`` in the ``src/figures``
directory to generated ``figures/mandelbrot.pdf``.
To change, supplement, or override this behavior, read on!
Multi-panel figures
-------------------
.. raw:: html
It is possible to include multiple figures within a ``figure`` environment, provided
they are all generated by the same script:
.. code-block:: latex
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/koch1.pdf}
\includegraphics[width=0.4\linewidth]{figures/koch2.pdf}
\caption{
Two Koch snowflakes.
}
% This label tells showyourwork that the script `figures/koch.py'
% generates the two PDF files included above
\label{fig:koch}
\end{centering}
\end{figure}
If you would like to include figures generated from different scripts in the
same ``figure`` environment, you'll have to provide a custom rule (see below).
Static figures
--------------
.. raw:: html
It is also possible to commit the figure PDF/PNG/SVG/etc directly and tell ``showyourwork`` not to
try to produce it programmatically. Simply place the figure in the ``src/static`` directory:
.. code-block:: latex
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{static/broccoli.pdf}
\caption{
A photo of some broccoli.
}
% The fact that the figure is in the static directory tells
% showyourwork not to look for a script that generates this figure
\label{fig:broccoli}
\end{centering}
\end{figure}
Custom dependencies: datasets
-----------------------------
.. raw:: html
Download a dataset and make it a dependency of a particular figure:
.. code-block:: python
# Custom rule to download a dataset
rule my_dataset:
output:
report("src/figures/my_dataset.dat", category="Dataset")
shell:
"curl https://zenodo.org/record/5187276/files/fibonacci.dat --output {output[0]}"
Specify this dependency in the configuration file ``showyourwork.yml``:
.. code-block:: yaml
# Tell showyourwork that `src/figures/my_figure.py`
# requires the file `src/figures/my_dataset.dat` to run
figure_dependencies:
my_figure.py:
- my_dataset.dat
Custom dependencies: scripts
----------------------------
.. raw:: html
Sometimes we would like to tell ``showyourwork`` about script dependencies, such as
when our figure script imports something from a locally-hosted script or package.
We can do this in the same way as above by specifying a dependency in the configuration
file ``showyourwork.yml``:
.. code-block:: yaml
# Tell showyourwork that `src/figures/my_figure.py`
# depends on `src/figures/utils/helper_script.py`
figure_dependencies:
my_figure.py:
- utils/helper_script.py
Custom figure scripts
---------------------
.. raw:: html
Specify a custom script for a figure. Useful when ``showyourwork`` can't
automatically determine the figure script, such as when a figure is
included outside of a ``figure`` environment:
.. code-block:: python
# Subclass the `figure` rule to specify that the figure
# `src/figures/custom_figure.pdf` is generated from the script
# `src/figures/custom_script.py`
use rule figure from showyourwork as custom_figure with:
input:
"src/figures/custom_script.py",
"environment.yml"
output:
report("src/figures/custom_figure.pdf", category="Figure")
Override the internal ``figure`` rule completely:
.. code-block:: python
rule custom_figure:
input:
"src/figures/custom_script.py",
"environment.yml",
output:
report("src/figures/custom_figure.pdf", category="Figure")
conda:
"environment.yml"
shell:
"cd src/figures && python custom_script.py"
Expensive figures
-----------------
.. raw:: html
.. note::
This example is a bit involved, and we're working on integrating this feature more seamlessly with
``showyourwork``. So stay tuned! It should get significantly easier to do this in the next release
of the workflow.
Quite often you may have a figure that is very computationally expensive to run. An example is a posterior distribution plot for an MCMC run, or a plot of an expensive fluid dynamical simulation. If the runtime is more than a few tens of minutes (on a single machine), you probably don't want to run it on GitHub Actions, even if you rely on ``showyourwork`` caching. One strategy to help with this is to specify two separate rules for generating the figure: one that generates it from scratch (by running the simulation), and one that generates it from an intermediate step (i.e., by loading a file containing the simulation results and plotting them directly). Specifically, you can set up the first rule so that it generates the intermediate file and uploads it to the cloud; the second rule downloads that file and plots the figure.
The ``expensive-figure`` workflow is an example of how to do this. It takes advantage of the `Zenodo API `_ to programmatically upload/download files. The workflow is a bit involved, but it consists of a ``Snakefile`` with two rules to produce the same output (a simulation result called ``simulation_results.dat``):
.. code-block:: python
# Custom rule to generate the simulation results and upload them to Zenodo
rule generate_simulation_results:
input:
"src/figures/run_simulation.py"
output:
report("src/figures/simulation_results.dat", category="Dataset")
conda:
"environment.yml"
shell:
"cd src/figures && python run_simulation.py && python zenodo.py --upload"
# Custom rule to download the simulation results from Zenodo
rule download_simulation_results:
output:
report("src/figures/simulation_results.dat", category="Dataset")
conda:
"environment.yml"
shell:
"cd src/figures && python zenodo.py --download"
The first rule is the one that runs the simulation (you can see that it calls the script ``run_simulation.py``) and uploads it to Zenodo (``python zenodo.py --upload``). The second rule simply downloads the simulation (``python zenodo.py --download``). Both ``run_simulation.py`` and ``zenodo.py`` are located in the ``src/figures`` directory for this workflow. Since we now have a rule ambiguity (both rules produce the same output), we need to tell ``Snakemake`` the order in which to try to execute these rules. Here's what we do (still in the ``Snakefile``):
.. code-block:: python
# If we are on GitHub Actions CI, use the rule where we download the data
# If not, use the rule where we generate the data.
import os
ON_GITHUB_ACTIONS = os.getenv("CI", "false") == "true"
if ON_GITHUB_ACTIONS:
ruleorder: download_simulation_results > generate_simulation_results
else:
ruleorder: generate_simulation_results > download_simulation_results
This snippet tells ``Snakemake`` to download the data if it's running on GitHub Actions (which always sets the ``CI`` environment variable to ``true``) and to generate it instead if it's running on a local computer.
The next thing we must do is tell ``showyourwork`` that the figure script (which we call ``my_figure.py``) requires the dataset (``simulation_results.dat``). We do this in ``showyourwork.yml``, as in the previous examples:
.. code-block:: yaml
# Tell showyourwork that `src/figures/my_figure.py`
# requires the file `src/figures/simulation_results.dat` to run
figure_dependencies:
my_figure.py:
- simulation_results.dat
Finally, we need to provide credentials for accessing the Zenodo API. Create a `Zenodo access token `_ and store it somewhere safe (NEVER place it in an unencrypted text file on GitHub, or anywhere public for that matter). If you poke around in ``zenodo.py``, you'll see that the script tries to read this token from the environment variable ``ZENODO_TOKEN``, so you should either set that variable within your session or export it in your `.bashrc` or `.zshrc` shell config file (see, e.g., `this link `_). Then, to ensure things also run properly on GitHub Actions, store it as a repository secret called ``ZENODO_TOKEN`` in your `repository settings `_. In order to make this secret available to the workflow, you also need to provide it in the ``.github/workflows/showyourwork.yml`` config file, as follows:
.. code-block:: yaml
- name: Build the article PDF
id: build
uses: ./showyourwork/showyourwork-action
env:
ZENODO_TOKEN: ${{ secrets.ZENODO_TOKEN }}
That should do the trick. Again, it's a bit involved, but you should be able to just copy the relevant files over to your repo and make a few tweaks (like the file names) to get it working in your workflow. Note that you don't have to change anything in ``zenodo.py`` _except_ the ``file_name``, ``deposit_title``, and ``deposit_description`` variables toward the bottom of the file. But as we mentioned above, we're working on dramatically simplifying this workflow, so stay tuned for a much simpler solution!