preprocess#

Does a first pass through the manuscript to infer the workflow graph.

This does a fast compile of the article, overriding commands like \label, \script, \variable, and \includegraphics to instead print their arguments to an XML log file, which we use to infer the relationships between input and output files. This information is then used to build the workflow graph for the main article build step.

preprocess.check_figure_format(figure)#

Check that all figures are declared correctly in tex/ms.tex so we can parse them corresponding XML tree.

Parameters:

figure – A figure XML element.

preprocess.flatten_dataset_contents(d, parent_key='', default_path=None)#

Flatten the contents dictionary of a dataset entry, filling in default mappings and removing zipfile extensions from the target path. Adapted from https://stackoverflow.com/a/6027615.

Parameters:
  • d (dict) – The dataset contents dictionary.

  • parent_key (str) – The parent key of the current dictionary.

  • default_path (str, optional) – The default path to use if the target is not specified.

preprocess.get_json_tree(xmlfile)#

Builds a dictionary containing mappings between input and output files.

Returns:

The JSON dependency tree for the article.

Return type:

dict

preprocess.get_xml_tree(xmlfile)#

Compiles the TeX file to generate the XML tree.

Returns:

The XML tree.

Return type:

xml.etree.ElementTree.ElementTree

preprocess.parse_datasets()#

Parse the datasets keys in the config file and populate entries with custom metadata.