The showyourwork.yml file

This is the configuration file for showyourwork, allowing you to customize several aspects of the workflow. Below is a list of all available options.

arxiv_tarball_exclude

Type: list

Description: List of of files/paths to exclude from the arXiv tarball. By default, showyourwork will never include python scripts, matplotlibrc config files, python and showyourwork temporaries, or .gitignore files in the tarball. It will also automatically exclude datasets that are uploaded to/downloaded from Zenodo. This option is useful if there are other files in your repository – such as static datasets or other kinds of scripts – that don’t need to be included in the tarball. Note that glob syntax is allowed, and all paths should be relative to the root of your repository.

Required: no

Default: []

Example:

arxiv_tarball_exclude:
  - src/data/dataset.dat
  - src/ms.bib
  - src/**/*.sh

CI

Type: bool

Description: Flag indicating whether or not this is a GitHub Actions continuous integration (CI) build. This is set automatically, but can be overridden here for debugging purposes.

Required: no

Default: (inferred automatically)

Example:

CI: false

dag

Type: bool

Description: Set to true to turn on DAG (directed acyclic graph) generation. The DAG is a graph showing the dependencies among the inputs and outputs of your project. While Snakemake generates a complete DAG of the build process (see Local builds), setting this flag to true will generate a custom, compact graph showing the dependencies among showyourwork-managed figures and datasets.

Required: no

Default: false

Example:

dag: true

dependencies

Type: list

Description: List of dependencies for each script. Each entry should be the path to a script (either a figure script or the TeX manuscript itself) relative to the repository root. Following each entry, provide a list of all files on which the script depends. These dependencies may either be static (such as helper scripts) or programmatically generated (such as datsets downloaded from Zenodo). In the latter case, instructions on how to generate them must be provided elsewhere (either via the zenodo key below or via a custom rule in the Snakefile). In both cases, changes to the dependency will result in a re-run of the section of the workflow that executes the script.

Required: no

Default: []

Example: Tell showyourwork that the figure script my_figure.py depends on a helper script called helper_script.py:

dependencies:
  src/figures/my_figure.py:
    - src/figures/utils/helper_script.py

See Script dependencies. You can also specify a dependency on a programmatically-generated file:

dependencies:
  src/figures/fibonacci.py:
      - src/data/fibonacci.dat

See Dataset dependencies. Finally, dependencies of the manuscript file are also allowed:

dependencies:
  src/ms.tex:
      - src/answer.tex

See Custom manuscript dependencies.

download_only

Type: bool

Description: If set to true, will never attempt to generate figure dependencies if they are hosted on Zenodo (instead, showyourwork downloads them). This behavior is similar to setting CI to true and is especially useful for third-party users who have cloned the repository and don’t want to re-run expensive simulation steps, or don’t have the authorization to upload files to the Zenodo deposit.

Required: no

Default: false

Example:

download_only: true

figexts

Type: list

Description: List of recognized figure extensions.

Required: no

Default: [pdf, png, eps, jpg, jpeg, gif, svg, tiff]

Example:

figexts:
  - pdf
  - png

ms

Type: str

Description: Path to the main TeX manuscript. Change this if you’d prefer to name your manuscript something other than src/ms.tex. Note that you should still keep it in the src/ directory. Note also that the compiled PDF will still be named ms.pdf regardless of this setting.

Required: no

Default: src/ms.tex

Example:

ms: src/article.tex

See Custom manuscript name.

scripts

Type: list

Description: List of script extensions and instructions on how to execute them to generate output. By default, showyourwork expects output files (e.g., figures or datasets) to be generated by executing the corresponding scripts with python. You can add custom rules here to produce output from scripts with other extensions, or change the behavior for executing python scripts (such as adding command line options, for instance). Each entry under scripts should be a file extension, and under each one should be a string specifying how to generate the output file from the input script. The following placeholders are recognized by showyourwork and expand as follows at runtime:

  • {script}: The full path to the input script.

  • {script.path}: The full path to the directory containing the input script.

  • {script.name}: The name of the input script (without the path).

  • {output}: The full path to the output file.

  • {output.path}: The full path to the directory containing the output file.

  • {output.name}: The name of the output file (without the path).

Required: no

Default: The default behavior for python scripts corresponds to the following specification in the yaml file:

scripts:
  py:
    cd {script.path} && python {script.name}

That is, python is used to execute all scripts that end in .py.

Important

By default, showyourwork always does a cd into the directory containing the script and executes it from within that directory; therefore, any relative paths within python scripts will be relative to the directory containing the script.

Example: We can tell showyourwork how to generate figures from Graphviz .gv files as follows:

scripts:
  gv:
    dot -Tpdf {script} > {output}

or, to run it from the directory containing the script (as discussed above),

scripts:
  gv:
    cd {script.path} && dot -Tpdf {script.name} > {output}

See Non-Python figure scripts.

tectonic_latest

Type: bool

Description: Use the latest version of tectonic (built from source) instead of the most recent stable version? You shouldn’t normally have to edit this entry.

Required: no

Default: false

Example:

tectonic_latest: true

tectonic_os

Type: str

Description: Operating system used for choosing which tectonic binary to install (only if tectonic_latest is true). This is usually determined automatically, but can be overridden. Options are x86_64-unknown-linux-gnu, x86_64-apple-darwin, or x86_64-pc-windows-msvc.

Required: no

Default: (inferred automatically)

Example:

tectonic_os: x86_64-apple-darwin

verbose

Type: bool

Description: Enable verbose output? Useful for debugging runs.

Required: no

Default: false

Example:

verbose: true

zenodo

Type: list

Description: A list of datasets to be download from and/or uploaded to Zenodo. Each entry should be the path to a dataset, followed by keys specifying information about the Zenodo deposit. These keys depend on the use case. If the deposit already exists (i.e., it was uploaded manually), then users should only specify the deposit version id. If the deposit does not exist, and users would like showyourwork to upload it/download it from Zenodo, they should specify the deposit concept id instead (see id below for more details). Additionally, users should specify the following keys (most of which are optional): script, title, description, and creators. Finally, if the deposit is a tarball consisting of many datasets, users should also specify the tarball contents. In both cases (manually uploaded and showyourwork-managed datasets), a token_name key is also accepted.

Note

For showyourwork-managed datasets, the script that generates the dataset will be executed when running the workflow locally (but only if there are changes to the dataset’s dependencies). When running on GitHub Actions, on the other hand, the script will never be executed; instead, showyourwork will always download the dataset from Zenodo. The idea here is to prevent the workflow from executing expensive operations on the cloud. In order for this to work, however, a deposit must exist, so you must run your workflow at least once locally before pushing the changes to GitHub.

Required: no

Default: []

Example: See Dataset dependencies, Simulation dependencies, Dependency tarballs, and Dependency tarballs (advanced).

zenodo.<dataset>.contents

Type: list

Description: If <dataset> is a .tar.gz file, users should provide a list of the contents of the tarball. If this is a static tarball that was manually uploaded to Zenodo (i.e., the provided id is a version id), this should be a list of full paths to the files to be created when the tarball is extracted. See below for details. If, on the other hand, this tarball is managed by showyourwork (i.e., the provided id is a concept id), this should be a list of the full paths of all the files to include in the tarball. These should be located in the src/data folder (or nested within it). Note that instructions for generating these individual files should be provided separately, either via the script key or via a custom rule in the Snakefile.

For static tarballs, users need to be careful when providing file paths. showyourwork will extract the tarball from the top-level directory of your repository and attempt to generate all of the files listed in contents, either by respecting the file path within the tarball or by treating it as a path relative to the src/data directory.

For example, consider the Zenodo-hosted file results.tar.gz, whose contents are

src/data/results.tar.gz
  ├── src/data/results_00.dat
  └── src/data/results_01.dat

We can specify the following settings for it in showyourwork.yml:

zenodo:
  - src/data/results.tar.gz:
      contents:
          - src/data/results_00.dat
          - src/data/results_01.dat

which will unpack the files results_00.dat and results_01.dat into the src/data folder. In this case, the source and destination paths are the same (i.e., the path inside the tarball is the path we extract the files to). But things will also work if we have a tarball with purely relative paths:

src/data/other_results.tar.gz
  ├── other_results_00.dat
  └── other_results_01.dat

and we specify the following in showyourwork.yml:

zenodo:
  - src/data/other_results.tar.gz:
      contents:
          - src/data/other_results_00.dat
          - src/data/other_results_01.dat

In this case, the source and destination paths are different, but showyourwork knows how to handle it.

Note that if you have files within nested folders inside your tarball, things should still work as long as you extract them into the src/data directory. Note that there is no need to specify the nested directories in contents: just the full path to the files; intermediate directories will be created as needed.

Required: yes, but only if <dataset> is a .tar.gz tarball.

Default:

Example: See Dependency tarballs.

zenodo.<dataset>.creators

Type: list

Description: A list of creators to be listed on the Zenodo record and associated with the record DOI.

Required: no

Default: The GitHub username of the current user

Example: See Simulation dependencies.

zenodo.<dataset>.description

Type: str

Description: A detailed description of the file, how it was generated, and how it should be used, to be displayed on the Zenodo record page.

Required: no

Default: "File uploaded from <repository-name>"

Example: See Simulation dependencies.

zenodo.<dataset>.id

Type: int

Description: A Zenodo id for a given deposit is the last part of its DOI. For example, a deposit with DOI 10.5281/zenodo.5749987 has id equal to 5749987. This is also the last part of the url for the corresponding record (https://zenodo.org/record/5749987). Importantly, Zenodo makes a distinction between version DOIs and concept DOIs. Version DOIs are static, and tied to a specific version of a deposit (the way you’d expect a DOI to behave). This is the type of id you should provide if you manually uploaded a dataset to Zenodo and only ever want showyourwork to download it. Concept DOIs, on the other hand, point to all versions of a given record, and always resolve to the latest version. If you want showyourwork to manage the dataset for you by generating it, uploading it, and downloading it, this is the kind of id you should provide. Check out the sidebar on the web page for the deposit in the example above:

You can see that the id 5749987 corresponds to a specific version (19) of the deposit, while the id 5662426 corresponds to all versions of the deposit (it’s listed under “Cite all versions?”). The former is a “version” id, while the latter is a “concept” id. You can read more about that in the Zenodo docs.

Note

If you’re just getting started and want a concept id for a fresh draft of a new Zenodo deposit, run

make reserve

from the top level of your repo. This will pre-reserve a concept id for you (assuming you’re properly authenticated) and print it to the terminal.

Required: yes

Default:

Example: The following snippet

zenodo:
  - src/data/results.tar.gz:
      id: 5749987

tells showyourwork to download the file results.tar.gz from the static Zenodo deposit at https://zenodo.org/record/5749987 (version 19 of the deposit, as mentioned above). This file must already exist, and showyourwork won’t ever attempt to re-generate it or re-upload it to Zenodo because it recognizes 5749987 as a version id.


Alternatively, we could specify the following:

zenodo:
  - src/data/results.tar.gz:
      id: 5662426
      script: src/analysis/generate_results.py

In this case, the id is a concept id, corresponding to all versions of the deposit, and showyourwork will take over management of the deposit. Note that we also provided a script instructing showyourwork how to generate new versions of the deposit. Whenever generate_results.py or any of its dependencies are modified, showyourwork will re-generate results.tar.gz and re-upload it to Zenodo under the same concept id when running the workflow locally. This will create a new version DOI under the same concept DOI. Note that in order for this to work, you must be properly authenticated; see token_name below. For a more detailed example, see Dataset dependencies.

zenodo.<dataset>.script

Type: str

Description: The path to the python script that generates the <dataset> (or, if <dataset> is a tarball, the script that generates its contents). Note that this must be a python script, even if custom script instructions are provided via the scripts key. To define custom rules for generating the dataset, see the Dependency tarballs (advanced) example.

Required: yes, unless a custom rule is provided in the Snakefile

Default:

Example: See Simulation dependencies.

zenodo.<dataset>.title

Type: str

Description: The title of the Zenodo deposit.

Required: no

Default: "<repository-name>:<dataset>"

Example: See Simulation dependencies.

zenodo.<dataset>.token_name

Type: str

Description: The name of the environment variable containing the Zenodo access token. To obtain this token, create a Zenodo account (if you don’t have one already) and generate a personal access token. Make sure to give it at least deposit:actions and deposit:write scopes, and store it somewhere safe. Then, assign your token to an environment variable called ZENODO_TOKEN (or whatever you set token_name to). I export mine from within my .zshrc or .bashrc config file so that it’s always available in all terminals.

Warning

Never include your personal access tokens in any files committed to GitHub!

Required: no

Default: ZENODO_TOKEN

Example: See Simulation dependencies.

zenodo_sandbox

Type: list

Description: A list of datasets to be download from and/or uploaded to Zenodo Sandbox. This key behaves in the same way and accepts all the same arguments as the zenodo key above, but it interfaces with sandbox.zenodo.org (instead of zenodo.org). Zenodo Sandbox works in the same way as Zenodo, but is meant for testing purposes only: deposits hosted in the Sandbox may be deleted at any time. Hosting datasets here is useful during development of your project; just make sure to switch over to zenodo when you’re ready to publish your paper!

Required: no

Default: []

Example: See Dataset dependencies, Simulation dependencies, Dependency tarballs, and Dependency tarballs (advanced).