Custom workflows

Note

We’re constantly adding examples to this page, so please check back soon for more. Or, if you have a request or a workflow you’d like to share, please either open an issue or suggest an edit to this page by clicking the GitHub link at the top.

Default figure generation

test status Repository Article PDF

By default, the workflow defined in the Snakefile looks like this:

# User config
configfile: "showyourwork.yml"


# Import the showyourwork module
module showyourwork:
    snakefile:
        "showyourwork/workflow/Snakefile"
    config:
        config


# Use all default rules
use rule * from showyourwork

The default behavior in this workflow is to infer figure dependencies based on the figure labels in the tex file. The following block in ms.tex

\begin{figure}
    \begin{centering}
        \includegraphics{figures/mandelbrot.pdf}
        \caption{The Mandelbrot set.}
        % This label tells showyourwork that the script `figures/mandelbrot.py'
        % generates the PDF file included above
        \label{fig:mandelbrot}
    \end{centering}
\end{figure}

tells showyourwork to execute a script called mandelbrot.py in the src/figures directory to generated figures/mandelbrot.pdf. To change, supplement, or override this behavior, read on!

Multi-panel figures

test status Repository Article PDF

It is possible to include multiple figures within a figure environment, provided they are all generated by the same script:

\begin{figure}[ht!]
    \begin{centering}
        \includegraphics[width=0.4\linewidth]{figures/koch1.pdf}
        \includegraphics[width=0.4\linewidth]{figures/koch2.pdf}
        \caption{
            Two Koch snowflakes.
        }
        % This label tells showyourwork that the script `figures/koch.py'
        % generates the two PDF files included above
        \label{fig:koch}
    \end{centering}
\end{figure}

If you would like to include figures generated from different scripts in the same figure environment, you’ll have to provide a custom rule (see below).

One script, multiple figures

test status Repository Article PDF

Conversely, we can also have different figure environments, all of which include figure files generated from the same script. If you follow the usual convention, this would result in duplicated labels, since these figure environments would share the same label (determined only by the name of the script that generated them). To get around this, showyourwork supports adding tags to the end of figure labels to make them unique.

\begin{figure}[ht!]
    \begin{centering}
        \includegraphics[width=0.4\linewidth]{figures/mandelbrot.pdf}
        \caption{
            This figure was generated by the script \texttt{mandelbrot.py}
            and is labeled \texttt{fig:mandelbrot:original}.
        }
        \label{fig:mandelbrot:original}
    \end{centering}
\end{figure}

\begin{figure}[ht!]
    \begin{centering}
        \includegraphics[width=0.4\linewidth]{figures/mandelbrot_red.pdf}
        \caption{
            This figure was generated by the script \texttt{mandelbrot.py}
            and is labeled \texttt{fig:mandelbrot:red}.
        }
        \label{fig:mandelbrot:red}
    \end{centering}
\end{figure}

In the example above, the script mandelbrot.py generates two PDFs, which are displayed in separate figure environments. We label them fig:mandelbrot:original and fig:mandelbrot:red to make them unique; showyourwork ignores everything after the second colon, and understands that both figure environments correspond to the same figure script (mandelbrot.py).

Static figures

test status Repository Article PDF

It is also possible to commit the figure PDF/PNG/SVG/etc directly and tell showyourwork not to try to produce it programmatically. Simply place the figure in the src/static directory:

\begin{figure}[ht!]
    \begin{centering}
        \includegraphics[width=0.4\linewidth]{static/broccoli.pdf}
        \caption{
            A photo of some broccoli.
        }
        % The fact that the figure is in the static directory tells
        % showyourwork not to look for a script that generates this figure
        \label{fig:broccoli}
    \end{centering}
\end{figure}

Script dependencies

test status Repository Article PDF

Sometimes we would like to tell showyourwork about script dependencies, such as when our figure script imports something from a locally-hosted script or package. We can do this by specifying a dependency in the configuration file showyourwork.yml:

# Tell showyourwork that `src/figures/my_figure.py`
# depends on `src/figures/utils/helper_script.py`
dependencies:
    src/figures/my_figure.py:
        - src/figures/utils/helper_script.py

Note that all paths are relative to the root of your repository.

Dataset dependencies

test status Repository Article PDF

If you have a dataset hosted on Zenodo, showyourwork can automatically download it for you and link to it in the corresponding figure caption in the article PDF. All you have to do is specify the name of the dataset and its Zenodo record ID in the config file showyourwork.yml:

# Tell showyourwork that `src/figures/fibonacci.py`
# requires the file `src/data/fibonacci.dat` to run
dependencies:
    src/figures/fibonacci.py:
        - src/data/fibonacci.dat

# Tell showyourwork where to find ``src/data/fibonacci.dat``
zenodo:
    - src/data/fibonacci.dat:
        id: 5187276

The YAML snippet above tells showyourwork that the script src/figures/fibonacci.py requires the dataset src/data/fibonacci.dat in order to run. It also tells showyourwork that this dataset can be downloaded from Zenodo, and that it has the record ID 5187276. Specifically, that means the file lives at the URL https://zenodo.org/record/5187276 and can be downloaded by running

curl https://zenodo.org/record/5187276/files/fibonacci.dat

Note that if this dataset is a tarball, you’ll have to untar it within fibonacci.py, or specify a custom rule in the Snakefile (see below).

Alternatively, you can manually specify how to download a dataset dependency. This is useful if, e.g., it’s hosted somewhere other than Zenodo, or if you need to do some post-processing (like unzipping) before running your figure script. To do that, simply don’t specify the entry in the zenodo section of your showyourwork.yml file:

dependencies:
    src/figures/fibonacci.py:
        - src/data/fibonacci.dat

and instead create a custom rule in the Snakefile:

# Custom rule to download a dataset
rule fibonacci:
    output:
        "src/figures/fibonacci.dat"
    shell:
        "curl https://zenodo.org/record/5187276/files/fibonacci.dat --output {output[0]}"

Note that this approach will not automatically add a dataset link to your figure caption.

Simulation dependencies

test status Repository Article PDF

Quite often you may have a figure that is very computationally expensive to run. An example is a posterior distribution plot for an MCMC run, or a plot of an expensive fluid dynamical simulation. If the runtime is more than a few tens of minutes (on a single machine), you probably don’t want to run it on GitHub Actions, even if you rely on showyourwork caching. One way around this is to run the simulation, upload the results to Zenodo (via the workflow discussed above), and treat that as a static “dataset” on which your figure depends. The downside, however, is that your workflow is no longer fully reproducible, since it depends on the result of a black-box simulation.

To address this, showyourwork supports dynamic rules that can alternate between running the simulation and uploading to Zenodo (when running on a local machine), and downloading the simulation from Zenodo (when running on GitHub Actions). This can be achieved by specifying additional instructions in the showyourwork.yml file:

dependencies:
    src/figures/my_figure.py:
        - src/data/simulation.dat

zenodo:
    - src/data/simulation.dat:
        script: src/figures/run_simulation.py
        sandbox: false
        token_name: ZENODO_TOKEN
        title: Simulation results
        description: >-
            This is the result of a very expensive simulation.
            Here is some text describing the simulation in detail,
            how it was generated, and how to use the dataset.
        creators:
            - Luger, Rodrigo

There’s a lot going on in this example, so let’s break it down piece by piece. First, we’re telling showyourwork that the figure script src/figures/my_figure.py requires the result of some expensive simulation, stored in the data file src/data/simulation.dat. Then, under zenodo:, instead of specifying the id: of the dataset, we instead explicitly tell showyourwork how to generate it with a script key. Specifically, we specify the Python script that runs the simulation.

Note

showyourwork executes the Python script from within the directory containing it. In this example, the simulation script is executed from within the src/figures/ directory, so it must save the simulation file as ../data/simulation.dat, since that’s where showyourwork expects to see it based on the config file.

The next several instructions tell showyourwork how to upload the results of the simulation to Zenodo. The title, description, and creators keys should be self-explanatory: they will show up in the metadata section of the Zenodo deposit. The sandbox key is a boolean flag telling showyourwork whether to use the Zenodo Sandbox service (the default is False); this is useful for testing and debugging, and should be disabled once you release your code/paper.

Finally, since showyourwork will upload the results of the simulation to Zenodo, it needs your credentials to access the API. So, in order for this all to work, you need to do three things:

  1. If you haven’t done this already, create a Zenodo account and generate a personal access token. Make sure to give it at least deposit:actions and deposit:write scopes, and store it somewhere safe.

  2. To give showyourwork access to Zenodo from your local machine, assign your token to an environment variable called ZENODO_TOKEN. I export mine from within my .zshrc or .bashrc config file so that it’s always available in all terminals.

  3. To give showyourwork access to Zenodo from GitHub Actions, create a repository secret in your GitHub repository called ZENODO_TOKEN and set its value equal to your Zenodo token.

Warning

Never include your personal access tokens in any files committed to GitHub!

Now you should be all set. Make sure to run your expensive simulation locally before pushing your changes to GitHub – otherwise GitHub Actions won’t find the file on Zenodo, and the build will fail.

If all goes well, you should see an icon pop up next to the corresponding figure caption with a link to the record on Zenodo for your simulation results.

Dependency tarballs

test status Repository Article PDF

showyourwork also supports the upload/download of Zenodo tarballs. Consider the following showyourwork.yml file:

dependencies:
    src/figures/my_figure.py:
        - src/data/results_0.dat
        - src/data/results_1.dat
        - src/data/results_2.dat
        - src/data/results_3.dat
        - src/data/results_4.dat
        - src/data/results_5.dat
        - src/data/results_6.dat
        - src/data/results_7.dat
        - src/data/results_8.dat
        - src/data/results_9.dat

zenodo:
    - src/data/results.tar.gz:
        script: src/data/run_simulation.py
        sandbox: false
        token_name: ZENODO_TOKEN
        title: Random numbers
        description: >-
            This is a collection of ten datasets, each containing
            ten iid zero-mean, unit-variance random numbers. These
            are used in an example of the showyourwork open source
            scientific article workflow.
        creators:
            - Luger, Rodrigo
        contents:
            - src/data/results_0.dat
            - src/data/results_1.dat
            - src/data/results_2.dat
            - src/data/results_3.dat
            - src/data/results_4.dat
            - src/data/results_5.dat
            - src/data/results_6.dat
            - src/data/results_7.dat
            - src/data/results_8.dat
            - src/data/results_9.dat

This is similar to the previous example, except this time the figure script depends on a large number of simulation result files. By specifying a contents key under a zenodo entry, we can instruct showyourwork to generate the tarball results.tar.gz out of those contents and upload it to Zenodo. We then list all of those individual files as dependencies of the figure script. This example works in the same way as above – the simulation is only ever run locally. So again, make sure to run it before pushing your changes to GitHub – otherwise GitHub Actions won’t find the tarball on Zenodo, and the build will fail.

By the way, there’s a handy feature of the YAML syntax that can save us some repetition: anchors and aliases. It’s a handy way of defining and re-using chunks of metadata. You can read more about them here. We can use the anchor/alias syntax to re-write the YAML file above as

dependencies:
    src/figures/my_figure.py: &results
        - src/data/results_0.dat
        - src/data/results_1.dat
        - src/data/results_2.dat
        - src/data/results_3.dat
        - src/data/results_4.dat
        - src/data/results_5.dat
        - src/data/results_6.dat
        - src/data/results_7.dat
        - src/data/results_8.dat
        - src/data/results_9.dat

zenodo:
    - src/data/results.tar.gz:
        script: src/data/run_simulation.py
        sandbox: false
        token_name: ZENODO_TOKEN
        title: Random numbers
        description: >-
            This is a collection of ten datasets, each containing
            ten iid zero-mean, unit-variance random numbers. These
            are used in an example of the showyourwork open source
            scientific article workflow.
        creators:
            - Luger, Rodrigo
        contents:
            *results

The first time we listed all our results files, we added an anchor (&results), which we then refer to as an alias (*results) the next time we need to list those files. The anchor/alias syntax can help make YAML files shorter and more readable.

Dependency tarballs (advanced)

test status Repository Article PDF

Sometimes our procedure for generating the results of a simulation might be more involved than simply executing a single python script. Consider the following example: we have a file src/data/run_simulation.py that accepts a number as input, runs a simulation, and then generates a dataset corresponding to that unique input. We would then like to ingest all of these datasets into a plotting script that generates a figure in the paper, all while taking advantage of the Zenodo tarball functionality in showyourwork.

Achieving this is simple: let’s re-use the showyourwork.yml file from the previous example, but this time omit the script key:

dependencies:
    src/figures/my_figure.py: &results
        - src/data/results_0.dat
        - src/data/results_1.dat
        - src/data/results_2.dat
        - src/data/results_3.dat
        - src/data/results_4.dat
        - src/data/results_5.dat
        - src/data/results_6.dat
        - src/data/results_7.dat
        - src/data/results_8.dat
        - src/data/results_9.dat

zenodo:
    - src/data/results.tar.gz:
        sandbox: false
        token_name: ZENODO_TOKEN
        title: Random numbers
        description: >-
            This is a collection of ten datasets, each containing
            ten iid zero-mean, unit-variance random numbers. These
            are used in an example of the showyourwork open source
            scientific article workflow.
        creators:
            - Luger, Rodrigo
        contents:
            *results

Instead of the script key, we include a custom rule in the Snakefile to generate all of the results files:

if not config["CI"]:
    rule analysis:
        input:
            "src/data/run_simulation.py"
        output:
            "src/data/results_{value}.dat"
        shell:
            "python {input[0]} {wildcards.value}"

That’s it! showyourwork automatically infers that it must execute the analysis rule with all values of the (integer) wildcard value in the range [0, 10) to produce the dependencies of the figure script. When run locally, it will tar them up and upload the tarball to Zenodo; when running on GitHub Actions, it will download and unpack the tarball instead of running the analysis rule.

Note that we explicitly placed the analysis rule inside a branch that gets executed only if config["CI"] is False. The config["CI"] variable is automatically set in all showyourwork workflows: it’s always False, except when the workflow is being executed on a GitHub Actions runner.

Many, many dependencies

test status Repository Article PDF

Sometimes, it might be a pain to explicitly list all the dependencies for a script. This could be the case if your simulation or analysis step produces many (tens, hundreds, or even thousands of) data files. The recommended way of dealing with this is through jinja templating. For example, instead of explicitly listing all the dependencies in showyourwork.yml:

showyourwork.yml:

dependencies:
    src/figures/my_figure.py:
        - src/data/results_0.dat
        - src/data/results_1.dat
        - src/data/results_2.dat
        - src/data/results_3.dat
        - src/data/results_4.dat
        - src/data/results_5.dat
        - src/data/results_6.dat
        - src/data/results_7.dat
        - src/data/results_8.dat
        - src/data/results_9.dat

we recommend deleting the config file showyourwork.yml and creating a new file called showyourwork.yml.jinja with the following template:

showyourwork.yml.jinja:

dependencies:
    src/figures/my_figure.py:
        {% for i in range(10) -%}
        - src/data/results_{{i}}.dat
        {% endfor %}

If you’re not familiar with jinja templating, check out the documentation. The idea here is to define a for loop over the variable i to list all the dependencies for us. But since this file isn’t a valid YAML config file, we have to add a bit of boilerplate at the very top of our Snakefile:

Snakefile:

# Render the config file
import jinja2
with open("showyourwork.yml", "w") as f:
    env = jinja2.Environment(loader=jinja2.FileSystemLoader("."))
    print(env.get_template("showyourwork.yml.jinja").render(), file=f)


# User config
configfile: "showyourwork.yml"

This code loads the jinja template, parses it, and prints it to the actual config file showyourwork.yml that is ingested by Snakemake. This way, showyourwork only ever sees the config file with all of the dependencies listed out explicitly, without any of the headaches associated with listing them all yourself.

Note that if you take this approach, it’s a good idea to add

showyourwork.yml

to your top-level .gitignore file so that it’s never committed (since this file is programmatically generated every time you run your workflow).

Custom figure scripts

test status Repository Article PDF

showyourwork allows you to specify custom scripts for figures. This is useful when showyourwork can’t automatically determine the figure script, such as when a figure is included outside of a figure environment. The easiest way is to subclass the figure rule defined in the showyourwork module:

# Subclass the `figure` rule to specify that the figure
# `src/figures/custom_figure.pdf` is generated from the script
# `src/figures/custom_script.py`
use rule figure from showyourwork as custom_figure with:
    input:
        "src/figures/custom_script.py",
        "environment.yml"
    output:
        "src/figures/custom_figure.pdf"

Alternatively, you may override the internal figure rule completely:

rule custom_figure:
    input:
        "src/figures/custom_script.py",
        "environment.yml",
    output:
        "src/figures/custom_figure.pdf"
    conda:
        "environment.yml"
    shell:
        "cd src/figures && python custom_script.py"

This can be used to execute arbitrary commands for generating figures, such as producing a figure via a language other than Python, or producing a figure from a Jupyter notebook. Note that in both cases, showyourwork expects that the first file listed under input is the main script associated with the figure, and this is what the link in the figure caption will point to on GitHub.

Figures that require LaTeX

test status Repository Article PDF

If you set matplotlib.rc("text", usetex=True) in your Python script, you’ll likely get an error on GitHub Actions complaining that it can’t find latex. That’s because the engine used to compile your TeX article – tectonic – is not a standard TeX distribution. We recommend disabling the usetex option in matplotlib, since the most common math-mode commands can be rendered using the built-in mathtext; see the matplotlib docs. If, however, you really do need a TeX installation, you can request it in the .github/workflows/showyourwork.yml file as follows:

- name: Build the article PDF
  id: build
  uses: ./showyourwork/showyourwork-action
  with:
    install-tex: true
  env:
    ZENODO_TOKEN: ${{ secrets.ZENODO_TOKEN }}

This will install TinyTex, a very lightweight TeX distribution, on the GitHub Actions runner. Note that TeX rendering in matplotlib requires certain packages. By default, showyourwork installs type1cm and cm-super. If you get an error message saying a package is not found, you can request a package called <package> to be installed as follows:

- name: Build the article PDF
  id: build
  uses: ./showyourwork/showyourwork-action
  with:
    install-tex: true
    tex-packages: |
      type1cm
      cm-super
      <package>
  env:
    ZENODO_TOKEN: ${{ secrets.ZENODO_TOKEN }}

Using graphicspath

test status Repository Article PDF

Users can take advantage of the graphicspath LaTeX command to specify a path for all the figures in the workflow. The following snippet

\graphicspath{{./figures/}}

\begin{figure}[ht!]
    \begin{centering}
        \includegraphics[width=0.4\linewidth]{foo.pdf}
        \label{fig:foo}
    \end{centering}
\end{figure}

is therefore equivalent to

\begin{figure}[ht!]
    \begin{centering}
        \includegraphics[width=0.4\linewidth]{figures/foo.pdf}
        \label{fig:foo}
    \end{centering}
\end{figure}

Note that only a single graphicspath call is supported, with only a single path. Additional calls / paths will be ignored by showyourwork and can lead to errors.

Other LaTeX classes

We are slowly adding support for LaTeX classes other than AASTeX. If you don’t see what you’re looking for here, please open an issue.

MNRAS test status Repository Article PDF
A&A test status Repository Article PDF

Custom manuscript name

test status Repository Article PDF

By default, showyourwork expects the manuscript to be called src/ms.tex. This can be customized; for example, if you wish to call your article src/article.tex, specify it in the showyourwork.yml file:

ms:
    src/article.tex

Note that showyourwork still expects it to live in the src directory.

Custom manuscript dependencies

test status Repository Article PDF

Sometimes it is useful to specify custom dependencies of the manuscript file, such as when you use the input directive to ingest content from other LaTeX files. Say we have a special rules that computes the answer to some problem in the Snakefile:

rule answer:
    input:
        "src/answer.py"
    output:
        "src/answer.tex"
    conda:
        "environment.yml"
    shell:
        "cd src && python answer.py"

In order to tell showyourwork that src/ms.tex depends on src/answer.tex, simply specify it as a dependency in showyourwork.yml:

dependencies:
    src/ms.tex:
        - src/answer.tex

Because of the way Snakemake builds the dependency graph, changes to src/answer.py will automatically trigger re-computation of src/answer.tex and a rebuild of the PDF.

Non-Python figure scripts

test status Repository Article PDF

Although showyourwork expects figures to be generated from Python scripts, it allows users to provide instructions on how to generate figures using any programming language. This is done in the showyourwork.yml config file under the scripts key. Each entry should be a file extension, such as sh for shell scripts, jl for Julia scripts, etc. Under each extension key, users should provide the shell command for generating a figure from the corresponding script.

For example, the default configuration for Python scripts looks like this:

scripts:
    py:
        python {script}

This tells showyourwork that to generate a figure from a Python script, all it needs to do is run the python shell command followed by the script name, which we provide as {script} (this special variable gets automatically expanded at runtime to the name–not the path–of the script file).

Users don’t need to specify this, however, as Python is the default language. The example linked to above shows a more realistic use case: generating a directed acyclic graph (DAG) from a Graphviz .gv file:

scripts:
    gv:
        dot -Tpdf {script} > {figure}

Here, we tell showyourwork that we need to run the dot command to generate the figure {figure} from the Graphviz script {script}. Note that {figure} is another special variable that gets expanded to the name of the output figure file. Note that because showyourwork expects scripts to have the .py extension by default, you might have to force-add (i.e., git add -f script.gv) scripts with other extensions in order to actually commit them!