Custom workflows¶
Note
We’re constantly adding examples to this page, so please check back soon for more. Or, if you have a request or a workflow you’d like to share, please either open an issue or suggest an edit to this page by clicking the GitHub link at the top.
Default figure generation¶
By default, the workflow defined in the Snakefile
looks like this:
# User config
configfile: "showyourwork.yml"
# Import the showyourwork module
module showyourwork:
snakefile:
"showyourwork/workflow/Snakefile"
config:
config
# Use all default rules
use rule * from showyourwork
The default behavior in this workflow is to infer figure dependencies based on
the figure labels in the tex file.
The following block in ms.tex
\begin{figure}
\begin{centering}
\includegraphics{figures/mandelbrot.pdf}
\caption{The Mandelbrot set.}
% This label tells showyourwork that the script `figures/mandelbrot.py'
% generates the PDF file included above
\label{fig:mandelbrot}
\end{centering}
\end{figure}
tells showyourwork
to execute a script called mandelbrot.py
in the src/figures
directory to generated figures/mandelbrot.pdf
.
To change, supplement, or override this behavior, read on!
Multi-panel figures¶
It is possible to include multiple figures within a figure
environment, provided
they are all generated by the same script:
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/koch1.pdf}
\includegraphics[width=0.4\linewidth]{figures/koch2.pdf}
\caption{
Two Koch snowflakes.
}
% This label tells showyourwork that the script `figures/koch.py'
% generates the two PDF files included above
\label{fig:koch}
\end{centering}
\end{figure}
If you would like to include figures generated from different scripts in the
same figure
environment, you’ll have to provide a custom rule (see below).
One script, multiple figures¶
Conversely, we can also have different figure environments, all of which include
figure files generated from the same script.
If you follow the usual convention, this would result in duplicated labels, since
these figure environments would share the same label (determined only by the name of the script
that generated them).
To get around this, showyourwork
supports adding tags to the end of figure labels
to make them unique.
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/mandelbrot.pdf}
\caption{
This figure was generated by the script \texttt{mandelbrot.py}
and is labeled \texttt{fig:mandelbrot:original}.
}
\label{fig:mandelbrot:original}
\end{centering}
\end{figure}
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/mandelbrot_red.pdf}
\caption{
This figure was generated by the script \texttt{mandelbrot.py}
and is labeled \texttt{fig:mandelbrot:red}.
}
\label{fig:mandelbrot:red}
\end{centering}
\end{figure}
In the example above, the script mandelbrot.py
generates two PDFs, which
are displayed in separate figure environments. We label them fig:mandelbrot:original
and fig:mandelbrot:red
to make them unique; showyourwork
ignores everything
after the second colon, and understands that both figure environments correspond to
the same figure script (mandelbrot.py
).
Static figures¶
It is also possible to commit the figure PDF/PNG/SVG/etc directly and tell showyourwork
not to
try to produce it programmatically. Simply place the figure in the src/static
directory:
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{static/broccoli.pdf}
\caption{
A photo of some broccoli.
}
% The fact that the figure is in the static directory tells
% showyourwork not to look for a script that generates this figure
\label{fig:broccoli}
\end{centering}
\end{figure}
Script dependencies¶
Sometimes we would like to tell showyourwork
about script dependencies, such as
when our figure script imports something from a locally-hosted script or package.
We can do this in the same way as above by specifying a dependency in the configuration
file showyourwork.yml
:
# Tell showyourwork that `src/figures/my_figure.py`
# depends on `src/figures/utils/helper_script.py`
figure_dependencies:
my_figure.py:
- utils/helper_script.py
Dataset dependencies¶
If you have a dataset hosted on Zenodo, showyourwork
can automatically
download it for you and link to it in the corresponding figure caption in the article PDF.
All you have to do is specify the name of the dataset and its Zenodo record ID in the config
file showyourwork.yml
:
# Tell showyourwork that `src/figures/fibonacci.py`
# requires the file `src/figures/fibonacci.dat` to run
figure_dependencies:
fibonacci.py:
- fibonacci.dat
# Tell showyourwork where to find ``src/figures/fibonacci.dat``
zenodo:
- fibonacci.dat:
download:
id: 5187276
The YAML snippet above tells showyourwork
that the script src/figures/fibonacci.py
requires the dataset src/figures/fibonacci.dat
in order to run (note that under the
figure_dependencies
key, all paths are relative to the src/figures
directory.)
It also tells showyourwork
that this dataset can be downloaded from Zenodo, and that
it has the record ID 5187276
. Specifically, that means the file lives at the URL
https://zenodo.org/record/5187276 and can be downloaded by running
curl https://zenodo.org/record/5187276/files/fibonacci.dat
Note that if this dataset is a tarball, you’ll have to untar it within fibonacci.py
, or
specify a custom rule in the Snakefile
(see below).
Alternatively, you can manually specify how to download a dataset dependency.
This is useful if, e.g., it’s hosted somewhere other than Zenodo, or if you
need to do some post-processing (like unzipping) before running your figure
script. To do that, simply don’t provide a download
instruction in the
showyourwork.yml
file:
figure_dependencies:
fibonacci.py:
- fibonacci.dat
and instead create a custom rule in the Snakefile
:
# Custom rule to download a dataset
rule fibonacci:
output:
"src/figures/fibonacci.dat"
shell:
"curl https://zenodo.org/record/5187276/files/fibonacci.dat --output {output[0]}"
Note that in the Snakefile
, all paths are relative to the top level of your repo.
Also note that this approach will not automatically add a dataset link to your figure caption.
Simulation dependencies¶
Quite often you may have a figure that is very computationally expensive to run. An example is a posterior distribution plot for an MCMC run, or a plot of an expensive fluid dynamical simulation. If the runtime is more than a few tens of minutes (on a single machine), you probably don’t want to run it on GitHub Actions, even if you rely on showyourwork caching. One way around this is to run the simulation, upload the results to Zenodo (via the workflow discussed above), and treat that as a static “dataset” on which your figure depends. The downside, however, is that your workflow is no longer fully reproducible, since it depends on the result of a black-box simulation.
To address this, showyourwork
supports dynamic rules that can alternate between running the simulation
and uploading to Zenodo (when running on a local machine), and downloading the simulation from Zenodo (when
running on GitHub Actions). This can be achieved by specifying additional instructions in the
showyourwork.yml
file:
figure_dependencies:
my_figure.py:
- simulation.dat
zenodo:
- simulation.dat:
generate:
shell: python run_simulation.py
dependencies:
- run_simulation.py
sandbox: false
token_name: ZENODO_TOKEN
title: Simulation results
description: >-
This is the result of a very expensive simulation.
Here is some text describing the simulation in detail,
how it was generated, and how to use the dataset.
creators:
- Luger, Rodrigo
There’s a lot going on in this example, so let’s break it down piece by piece.
First, we’re telling showyourwork
that the figure script src/figures/my_figure.py
requires the result of some expensive simulation, stored in the data file src/figures/simulation.dat
.
Then, under zenodo:
, instead of specifying the download: id:
of the dataset, we instead explicitly tell
showyourwork
how to generate it with the generate
key. Specifically, we provide a
shell
command to run the simulation and produce the output (recalling that all paths
are relative to the src/figures
directory, which is also the CWD for the shell command).
We also tell showyourwork
about any dependencies
of the simulation. These, as before,
are files that, when modified, will trigger a re-run of the simulation (but only if running locally).
In this case, changes to the script src/figures/run_simulation.py
will result in a re-run
of the simulation the next time I execute the workflow locally.
The next several instructions tell showyourwork
how to upload the results of the simulation
to Zenodo. The title
, description
, and creators
keys should be self-explanatory: they
will show up in the metadata section of the Zenodo deposit.
The sandbox
key is a boolean flag telling showyourwork
whether to use the Zenodo Sandbox
service (the default is False); this is useful for testing and debugging, and should be disabled
once you release your code/paper.
Finally, since showyourwork
will upload the results of the simulation to Zenodo, it needs your
credentials to access the API. So, in order for this all to work, you need to do three things:
If you haven’t done this already, create a Zenodo account and generate a personal access token. Make sure to give it at least
deposit:actions
anddeposit:write
scopes, and store it somewhere safe.To give
showyourwork
access to Zenodo from your local machine, assign your token to an environment variable calledZENODO_TOKEN
. I export mine from within my.zshrc
or.bashrc
config file so that it’s always available in all terminals.To give
showyourwork
access to Zenodo from GitHub Actions, create a repository secret in your GitHub repository calledZENODO_TOKEN
and set its value equal to your Zenodo token.
Warning
Never include your personal access tokens in any files committed to GitHub!
Now you should be all set. Make sure to run your expensive simulation locally before pushing your changes to GitHub – otherwise GitHub Actions won’t find the file on Zenodo, and the build will fail.
If all goes well, you should see an icon pop up next to the corresponding figure caption with a link to the record on Zenodo for your simulation results.
Custom figure scripts¶
showyourwork
allows you to specify custom scripts for figures. This is useful when showyourwork
can’t
automatically determine the figure script, such as when a figure is
included outside of a figure
environment. The easiest way is to subclass the
figure
rule defined in the showyourwork
module:
# Subclass the `figure` rule to specify that the figure
# `src/figures/custom_figure.pdf` is generated from the script
# `src/figures/custom_script.py`
use rule figure from showyourwork as custom_figure with:
input:
"src/figures/custom_script.py",
"environment.yml"
output:
"src/figures/custom_figure.pdf"
Alternatively, you may override the internal figure
rule completely:
rule custom_figure:
input:
"src/figures/custom_script.py",
"environment.yml",
output:
"src/figures/custom_figure.pdf"
conda:
"environment.yml"
shell:
"cd src/figures && python custom_script.py"
This can be used to execute arbitrary commands for generating figures, such
as producing a figure via a language other than Python, or producing a figure
from a Jupyter notebook. Note that in both cases, showyourwork
expects that
the first file listed under input
is the main script associated with the
figure, and this is what the link in the figure caption will point to on GitHub.