# preprocess#

Does a first pass through the manuscript to infer the workflow graph.

This does a fast compile of the article, overriding commands like \label, \script, \variable, and \includegraphics to instead print their arguments to an XML log file, which we use to infer the relationships between input and output files. This information is then used to build the workflow graph for the main article build step.

preprocess.check_figure_format(figure)#

Check that all figures are declared correctly in tex/ms.tex so we can parse them corresponding XML tree.

Parameters

figure – A figure XML element.

preprocess.flatten_dataset_contents(d, parent_key='', default_path=None)#

Flatten the contents dictionary of a dataset entry, filling in default mappings and removing zipfile extensions from the target path. Adapted from https://stackoverflow.com/a/6027615.

Parameters
• d (dict) – The dataset contents dictionary.

• parent_key (str) – The parent key of the current dictionary.

• default_path (str, optional) – The default path to use if the target is not specified.

preprocess.get_json_tree()#

Builds a dictionary containing mappings between input and output files.

Returns

The JSON dependency tree for the article.

Return type

dict

preprocess.get_xml_tree()#

Compiles the TeX file to generate the XML tree.

Returns

The XML tree.

Return type

xml.etree.ElementTree.ElementTree

preprocess.parse_datasets()#

Parse the datasets keys in the config file and populate entries with custom metadata.