Custom workflows
================
.. note::
We're constantly adding examples to this page, so please check back soon for more.
Or, if you have a request or a workflow you'd like to share, please either open an
issue or suggest an edit to this page by clicking the GitHub link at the top.
Default figure generation
-------------------------
.. raw:: html
By default, the workflow defined in the ``Snakefile`` looks like this:
.. code-block:: python
# User config
configfile: "showyourwork.yml"
# Import the showyourwork module
module showyourwork:
snakefile:
"showyourwork/workflow/Snakefile"
config:
config
# Use all default rules
use rule * from showyourwork
The default behavior in this workflow is to infer figure dependencies based on
the figure labels in the tex file.
The following block in ``ms.tex``
.. code-block:: latex
\begin{figure}
\begin{centering}
\includegraphics{figures/mandelbrot.pdf}
\caption{The Mandelbrot set.}
% This label tells showyourwork that the script `figures/mandelbrot.py'
% generates the PDF file included above
\label{fig:mandelbrot}
\end{centering}
\end{figure}
tells ``showyourwork`` to execute a script called ``mandelbrot.py`` in the ``src/figures``
directory to generated ``figures/mandelbrot.pdf``.
To change, supplement, or override this behavior, read on!
Multi-panel figures
-------------------
.. raw:: html
It is possible to include multiple figures within a ``figure`` environment, provided
they are all generated by the same script:
.. code-block:: latex
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/koch1.pdf}
\includegraphics[width=0.4\linewidth]{figures/koch2.pdf}
\caption{
Two Koch snowflakes.
}
% This label tells showyourwork that the script `figures/koch.py'
% generates the two PDF files included above
\label{fig:koch}
\end{centering}
\end{figure}
If you would like to include figures generated from different scripts in the
same ``figure`` environment, you'll have to provide a custom rule (see below).
One script, multiple figures
----------------------------
.. raw:: html
Conversely, we can also have different figure environments, all of which include
figure files generated from the same script.
If you follow the usual convention, this would result in duplicated labels, since
these figure environments would share the same label (determined only by the name of the script
that generated them).
To get around this, ``showyourwork`` supports adding tags to the end of figure labels
to make them unique.
.. code-block:: latex
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/mandelbrot.pdf}
\caption{
This figure was generated by the script \texttt{mandelbrot.py}
and is labeled \texttt{fig:mandelbrot:original}.
}
\label{fig:mandelbrot:original}
\end{centering}
\end{figure}
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/mandelbrot_red.pdf}
\caption{
This figure was generated by the script \texttt{mandelbrot.py}
and is labeled \texttt{fig:mandelbrot:red}.
}
\label{fig:mandelbrot:red}
\end{centering}
\end{figure}
In the example above, the script ``mandelbrot.py`` generates two PDFs, which
are displayed in separate figure environments. We label them ``fig:mandelbrot:original``
and ``fig:mandelbrot:red`` to make them unique; ``showyourwork`` ignores everything
after the second colon, and understands that both figure environments correspond to
the same figure script (``mandelbrot.py``).
Static figures
--------------
.. raw:: html
It is also possible to commit the figure PDF/PNG/SVG/etc directly and tell ``showyourwork`` not to
try to produce it programmatically. Simply place the figure in the ``src/static`` directory:
.. code-block:: latex
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{static/broccoli.pdf}
\caption{
A photo of some broccoli.
}
% The fact that the figure is in the static directory tells
% showyourwork not to look for a script that generates this figure
\label{fig:broccoli}
\end{centering}
\end{figure}
Script dependencies
-------------------
.. raw:: html
Sometimes we would like to tell ``showyourwork`` about script dependencies, such as
when our figure script imports something from a locally-hosted script or package.
We can do this by specifying a dependency in the configuration
file ``showyourwork.yml``:
.. code-block:: yaml
# Tell showyourwork that `src/figures/my_figure.py`
# depends on `src/figures/utils/helper_script.py`
dependencies:
src/figures/my_figure.py:
- src/figures/utils/helper_script.py
Note that all paths are relative to the root of your repository.
Dataset dependencies
--------------------
.. raw:: html
If you have a dataset hosted on `Zenodo `_, ``showyourwork`` can automatically
download it for you and link to it in the corresponding figure caption in the article PDF.
All you have to do is specify the name of the dataset and its Zenodo record ID in the config
file ``showyourwork.yml``:
.. code-block:: yaml
# Tell showyourwork that `src/figures/fibonacci.py`
# requires the file `src/data/fibonacci.dat` to run
dependencies:
src/figures/fibonacci.py:
- src/data/fibonacci.dat
# Tell showyourwork where to find ``src/data/fibonacci.dat``
zenodo:
- src/data/fibonacci.dat:
id: 5187276
The YAML snippet above tells ``showyourwork`` that the script ``src/figures/fibonacci.py``
requires the dataset ``src/data/fibonacci.dat`` in order to run.
It also tells ``showyourwork`` that this dataset can be downloaded from Zenodo, and that
it has the record ID ``5187276``. Specifically, that means the file lives at the URL
``_ and can be downloaded by running
.. code-block:: bash
curl https://zenodo.org/record/5187276/files/fibonacci.dat
Note that if this dataset is a tarball, you'll have to untar it within ``fibonacci.py``, or
specify a custom rule in the ``Snakefile`` (see below).
Alternatively, you can manually specify how to download a dataset dependency.
This is useful if, e.g., it's hosted somewhere other than Zenodo, or if you
need to do some post-processing (like unzipping) before running your figure
script. To do that, simply don't specify the entry in the ``zenodo`` section of your
``showyourwork.yml`` file:
.. code-block:: yaml
dependencies:
src/figures/fibonacci.py:
- src/data/fibonacci.dat
and instead create a custom rule in the ``Snakefile``:
.. code-block:: python
# Custom rule to download a dataset
rule fibonacci:
output:
"src/figures/fibonacci.dat"
shell:
"curl https://zenodo.org/record/5187276/files/fibonacci.dat --output {output[0]}"
Note that this approach will not automatically add a dataset link to your figure caption.
Simulation dependencies
-----------------------
.. raw:: html
Quite often you may have a figure that is very computationally expensive to run.
An example is a posterior distribution plot for an MCMC run, or a plot of an expensive fluid dynamical simulation.
If the runtime is more than a few tens of minutes (on a single machine), you probably don’t want to run it on
GitHub Actions, even if you rely on showyourwork caching. One way around this is to run the simulation,
upload the results to Zenodo (via the workflow discussed above), and treat that as a static "dataset" on which
your figure depends. The downside, however, is that your workflow is no longer fully reproducible, since
it depends on the result of a black-box simulation.
To address this, ``showyourwork`` supports dynamic rules that can alternate between running the simulation
and uploading to Zenodo (when running on a local machine), and downloading the simulation from Zenodo (when
running on GitHub Actions). This can be achieved by specifying additional instructions in the
``showyourwork.yml`` file:
.. code-block:: yaml
dependencies:
src/figures/my_figure.py:
- src/data/simulation.dat
zenodo:
- src/data/simulation.dat:
script: src/figures/run_simulation.py
sandbox: false
token_name: ZENODO_TOKEN
title: Simulation results
description: >-
This is the result of a very expensive simulation.
Here is some text describing the simulation in detail,
how it was generated, and how to use the dataset.
creators:
- Luger, Rodrigo
There's a lot going on in this example, so let's break it down piece by piece.
First, we're telling ``showyourwork`` that the figure script ``src/figures/my_figure.py``
requires the result of some expensive simulation, stored in the data file ``src/data/simulation.dat``.
Then, under ``zenodo:``, instead of specifying the ``id:`` of the dataset, we instead explicitly tell
``showyourwork`` how to generate it with a ``script`` key. Specifically, we specify the Python
``script`` that runs the simulation.
.. note::
``showyourwork`` executes the Python ``script`` from within the directory containing it.
In this example, the simulation script is executed from within the ``src/figures/`` directory,
so it must save the simulation file as ``../data/simulation.dat``, since that's where
``showyourwork`` expects to see it based on the config file.
The next several instructions tell ``showyourwork`` how to upload the results of the simulation
to Zenodo. The ``title``, ``description``, and ``creators`` keys should be self-explanatory: they
will show up in the metadata section of the Zenodo deposit.
The ``sandbox`` key is a boolean flag telling ``showyourwork`` whether to use the ``Zenodo Sandbox``
service (the default is False); this is useful for testing and debugging, and should be disabled
once you release your code/paper.
Finally, since ``showyourwork`` will upload the results of the simulation to Zenodo, it needs your
credentials to access the API. So, in order for this all to work, you need to do three things:
1. If you haven't done this already, create a `Zenodo account `_ and
generate a `personal access token `_.
Make sure to give it at least ``deposit:actions`` and ``deposit:write`` scopes, and store it somewhere
safe.
2. To give ``showyourwork`` access to Zenodo from your local machine, assign your token to an environment variable
called ``ZENODO_TOKEN``. I export mine from within my
``.zshrc`` or ``.bashrc`` config file so that it's always available in all terminals.
3. To give ``showyourwork`` access to Zenodo from GitHub Actions, create a `repository secret `_
in your GitHub repository called ``ZENODO_TOKEN`` and set its value equal to your Zenodo token.
.. warning::
Never include your personal access tokens in any files committed to GitHub!
Now you should be all set. Make sure to run your expensive simulation locally before pushing your changes
to GitHub -- otherwise GitHub Actions won't find the file on Zenodo, and the build will fail.
If all goes well, you should see an icon pop up next to the corresponding figure caption with a link
to the record on Zenodo for your simulation results.
Dependency tarballs
-------------------
.. raw:: html
``showyourwork`` also supports the upload/download of Zenodo tarballs. Consider the following ``showyourwork.yml`` file:
.. code-block:: yaml
dependencies:
src/figures/my_figure.py:
- src/data/results_0.dat
- src/data/results_1.dat
- src/data/results_2.dat
- src/data/results_3.dat
- src/data/results_4.dat
- src/data/results_5.dat
- src/data/results_6.dat
- src/data/results_7.dat
- src/data/results_8.dat
- src/data/results_9.dat
zenodo:
- src/data/results.tar.gz:
script: src/data/run_simulation.py
sandbox: false
token_name: ZENODO_TOKEN
title: Random numbers
description: >-
This is a collection of ten datasets, each containing
ten iid zero-mean, unit-variance random numbers. These
are used in an example of the showyourwork open source
scientific article workflow.
creators:
- Luger, Rodrigo
contents:
- src/data/results_0.dat
- src/data/results_1.dat
- src/data/results_2.dat
- src/data/results_3.dat
- src/data/results_4.dat
- src/data/results_5.dat
- src/data/results_6.dat
- src/data/results_7.dat
- src/data/results_8.dat
- src/data/results_9.dat
This is similar to the previous example, except this time the figure script
depends on a large number of simulation result files. By specifying a ``contents``
key under a ``zenodo`` entry, we can instruct ``showyourwork`` to generate the
tarball ``results.tar.gz`` out of those contents and upload it to Zenodo.
We then list all of those *individual* files as dependencies of the figure script.
This example works in the same way as above -- the simulation is only ever
run locally. So again, make sure to run it before pushing your changes
to GitHub -- otherwise GitHub Actions won't find the tarball on Zenodo,
and the build will fail.
By the way, there's a handy feature of the YAML syntax that can save us
some repetition: anchors and aliases. It's a handy way of defining and re-using
chunks of metadata. You can
`read more about them here `_.
We can use the anchor/alias syntax to re-write the YAML file above as
.. code-block:: yaml
dependencies:
src/figures/my_figure.py: &results
- src/data/results_0.dat
- src/data/results_1.dat
- src/data/results_2.dat
- src/data/results_3.dat
- src/data/results_4.dat
- src/data/results_5.dat
- src/data/results_6.dat
- src/data/results_7.dat
- src/data/results_8.dat
- src/data/results_9.dat
zenodo:
- src/data/results.tar.gz:
script: src/data/run_simulation.py
sandbox: false
token_name: ZENODO_TOKEN
title: Random numbers
description: >-
This is a collection of ten datasets, each containing
ten iid zero-mean, unit-variance random numbers. These
are used in an example of the showyourwork open source
scientific article workflow.
creators:
- Luger, Rodrigo
contents:
*results
The first time we listed all our results files, we added an anchor (``&results``),
which we then refer to as an alias (``*results``) the next time we need to
list those files. The anchor/alias syntax can help make YAML files shorter
and more readable.
Dependency tarballs (advanced)
------------------------------
.. raw:: html
Sometimes our procedure for generating the results of a simulation might be more involved
than simply executing a single python script. Consider the following example: we have
a file ``src/data/run_simulation.py`` that accepts a number as input, runs a simulation,
and then generates a dataset corresponding to that unique input.
We would then like to ingest all of these datasets into a plotting script that generates
a figure in the paper, all while taking advantage of the Zenodo tarball functionality
in ``showyourwork``.
Achieving this is simple: let's re-use the ``showyourwork.yml`` file from the previous
example, but this time **omit** the ``script`` key:
.. code-block:: yaml
dependencies:
src/figures/my_figure.py: &results
- src/data/results_0.dat
- src/data/results_1.dat
- src/data/results_2.dat
- src/data/results_3.dat
- src/data/results_4.dat
- src/data/results_5.dat
- src/data/results_6.dat
- src/data/results_7.dat
- src/data/results_8.dat
- src/data/results_9.dat
zenodo:
- src/data/results.tar.gz:
sandbox: false
token_name: ZENODO_TOKEN
title: Random numbers
description: >-
This is a collection of ten datasets, each containing
ten iid zero-mean, unit-variance random numbers. These
are used in an example of the showyourwork open source
scientific article workflow.
creators:
- Luger, Rodrigo
contents:
*results
Instead of the ``script`` key, we include a custom ``rule`` in the ``Snakefile``
to generate all of the results files:
.. code-block:: python
if not config["CI"]:
rule analysis:
input:
"src/data/run_simulation.py"
output:
"src/data/results_{value}.dat"
shell:
"python {input[0]} {wildcards.value}"
That's it! ``showyourwork`` automatically infers that it must execute the
``analysis`` rule with all values of the (integer) wildcard ``value``
in the range ``[0, 10)`` to produce the dependencies of the figure script.
When run locally, it will tar them up and upload the tarball to Zenodo;
when running on GitHub Actions, it will download and unpack the tarball
instead of running the ``analysis`` rule.
Note that we explicitly placed the ``analysis`` rule inside a branch that gets
executed only if ``config["CI"] is False``. The ``config["CI"]`` variable
is automatically set in all ``showyourwork`` workflows: it's always
``False``, except when the workflow is being executed on a GitHub Actions
runner.
Many, many dependencies
-----------------------
.. raw:: html
Sometimes, it might be a pain to explicitly list all the dependencies for a script. This could
be the case if your simulation or analysis step produces many (tens, hundreds, or even thousands of)
data files. The recommended way of dealing with this is through ``jinja`` templating.
For example, instead of explicitly listing all the dependencies in ``showyourwork.yml``:
**showyourwork.yml:**
.. code-block:: yaml
dependencies:
src/figures/my_figure.py:
- src/data/results_0.dat
- src/data/results_1.dat
- src/data/results_2.dat
- src/data/results_3.dat
- src/data/results_4.dat
- src/data/results_5.dat
- src/data/results_6.dat
- src/data/results_7.dat
- src/data/results_8.dat
- src/data/results_9.dat
we recommend **deleting the config file** ``showyourwork.yml`` and creating
a new file called ``showyourwork.yml.jinja`` with the following template:
**showyourwork.yml.jinja:**
.. code-block:: jinja
dependencies:
src/figures/my_figure.py:
{% for i in range(10) -%}
- src/data/results_{{i}}.dat
{% endfor %}
If you're not familiar with ``jinja`` templating, check out the
`documentation `_. The idea here
is to define a for loop over the variable ``i`` to list all the dependencies
for us. But since this file isn't a valid YAML config file, we have to add
a bit of boilerplate at the very top of our ``Snakefile``:
**Snakefile:**
.. code-block:: python
# Render the config file
import jinja2
with open("showyourwork.yml", "w") as f:
env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
print(env.get_template("showyourwork.yml.jinja").render(), file=f)
# User config
configfile: "showyourwork.yml"
This code loads the ``jinja`` template, parses it, and prints it to the
actual config file ``showyourwork.yml`` that is ingested by ``Snakemake``.
This way, ``showyourwork`` only ever sees the config file with all of the
dependencies listed out explicitly, without any of the headaches associated
with listing them all yourself.
Note that if you take this approach, it's a good idea to add
.. code-block::
showyourwork.yml
to your top-level ``.gitignore`` file so that it's never committed (since
this file is programmatically generated every time you run your workflow).
Custom figure scripts
---------------------
.. raw:: html
``showyourwork`` allows you to specify custom scripts for figures. This is useful when ``showyourwork`` can't
automatically determine the figure script, such as when a figure is
included outside of a ``figure`` environment. The easiest way is to subclass the
``figure`` rule defined in the ``showyourwork`` module:
.. code-block:: python
# Subclass the `figure` rule to specify that the figure
# `src/figures/custom_figure.pdf` is generated from the script
# `src/figures/custom_script.py`
use rule figure from showyourwork as custom_figure with:
input:
"src/figures/custom_script.py",
"environment.yml"
output:
"src/figures/custom_figure.pdf"
Alternatively, you may override the internal ``figure`` rule completely:
.. code-block:: python
rule custom_figure:
input:
"src/figures/custom_script.py",
"environment.yml",
output:
"src/figures/custom_figure.pdf"
conda:
"environment.yml"
shell:
"cd src/figures && python custom_script.py"
This can be used to execute arbitrary commands for generating figures, such
as producing a figure via a language other than Python, or producing a figure
from a Jupyter notebook. Note that in both cases, ``showyourwork`` expects that
the first file listed under ``input`` is the main script associated with the
figure, and this is what the link in the figure caption will point to on GitHub.
Figures that require LaTeX
--------------------------
.. raw:: html
If you set ``matplotlib.rc("text", usetex=True)`` in your ``Python`` script, you'll likely
get an error on GitHub Actions complaining that it can't find ``latex``. That's because the
engine used to compile your TeX article -- ``tectonic`` -- is not a standard TeX
distribution. We recommend disabling the ``usetex`` option in ``matplotlib``, since
the most common math-mode commands can be rendered using the built-in ``mathtext``;
see `the matplotlib docs `_.
If, however, you really do need a TeX installation, you can request it in the
``.github/workflows/showyourwork.yml`` file as follows:
.. code-block:: yaml
- name: Build the article PDF
id: build
uses: ./showyourwork/showyourwork-action
with:
install-tex: true
env:
ZENODO_TOKEN: ${{ secrets.ZENODO_TOKEN }}
This will install `TinyTex `_, a very lightweight
TeX distribution, on the GitHub Actions runner. Note that TeX rendering in ``matplotlib``
requires certain packages. By default, ``showyourwork`` installs ``type1cm`` and
``cm-super``. If you get an error message saying a package is not found, you can request
a package called ```` to be installed as follows:
.. code-block:: yaml
- name: Build the article PDF
id: build
uses: ./showyourwork/showyourwork-action
with:
install-tex: true
tex-packages: |
type1cm
cm-super
env:
ZENODO_TOKEN: ${{ secrets.ZENODO_TOKEN }}
Using graphicspath
------------------
.. raw:: html
Users can take advantage of the ``graphicspath`` LaTeX command to specify a
path for all the figures in the workflow. The following snippet
.. code-block:: TeX
\graphicspath{{./figures/}}
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{foo.pdf}
\label{fig:foo}
\end{centering}
\end{figure}
is therefore equivalent to
.. code-block:: TeX
\begin{figure}[ht!]
\begin{centering}
\includegraphics[width=0.4\linewidth]{figures/foo.pdf}
\label{fig:foo}
\end{centering}
\end{figure}
Note that only a **single** ``graphicspath`` call is supported, with only
a **single** path. Additional calls / paths will be ignored by ``showyourwork``
and can lead to errors.
Other LaTeX classes
-------------------
We are slowly adding support for LaTeX classes other than AASTeX. If you
don't see what you're looking for here, please
`open an issue `_.
.. raw:: html
Custom manuscript name
----------------------
.. raw:: html
By default, ``showyourwork`` expects the manuscript to be called ``src/ms.tex``. This
can be customized; for example, if you wish to call your article ``src/article.tex``,
specify it in the ``showyourwork.yml`` file:
.. code-block:: yaml
ms:
src/article.tex
Note that ``showyourwork`` still expects it to live in the ``src`` directory.
Custom manuscript dependencies
------------------------------
.. raw:: html
Sometimes it is useful to specify custom dependencies of the manuscript file, such as when
you use the ``input`` directive to ingest content from other LaTeX files. Say we have a special
rules that computes the answer to some problem in the ``Snakefile``:
.. code-block:: python
rule answer:
input:
"src/answer.py"
output:
"src/answer.tex"
conda:
"environment.yml"
shell:
"cd src && python answer.py"
In order to tell ``showyourwork`` that ``src/ms.tex`` depends on ``src/answer.tex``, simply
specify it as a dependency in ``showyourwork.yml``:
.. code-block:: yaml
dependencies:
src/ms.tex:
- src/answer.tex
Because of the way ``Snakemake`` builds the dependency graph, changes to ``src/answer.py``
will automatically trigger re-computation of ``src/answer.tex`` and a rebuild of the PDF.
Non-Python figure scripts
-------------------------
.. raw:: html
Although ``showyourwork`` expects figures to be generated from ``Python`` scripts, it allows
users to provide instructions on how to generate figures using any programming language.
This is done in the ``showyourwork.yml`` config file under the ``scripts`` key. Each entry
should be a file extension, such as ``sh`` for shell scripts, ``jl`` for ``Julia`` scripts,
etc. Under each extension key, users should provide the shell command for generating a figure
from the corresponding script.
For example, the default configuration for ``Python`` scripts looks like this:
.. code-block:: yaml
scripts:
py:
python {script}
This tells ``showyourwork`` that to generate a figure from a ``Python`` script, all it
needs to do is run the ``python`` shell command followed by the script name, which we
provide as ``{script}`` (this special variable gets automatically expanded at runtime to
the name--not the path--of the script file).
Users don't need to specify this, however, as ``Python`` is the default language.
The example linked to above shows a more realistic use case: generating a directed
acyclic graph (DAG) from a Graphviz ``.gv`` file:
.. code-block:: yaml
scripts:
gv:
dot -Tpdf {script} > {figure}
Here, we tell ``showyourwork`` that we need to run the ``dot`` command to generate
the figure ``{figure}`` from the Graphviz script ``{script}``. Note that ``{figure}``
is another special variable that gets expanded to the name of the output figure file.
Note that because ``showyourwork`` expects scripts to have the ``.py`` extension
by default, you might have to force-add (i.e., ``git add -f script.gv``) scripts
with other extensions in order to actually commit them!