This section details the interface-caching mechanism, exposed in the nipype.caching
module.
Pipelines
(also called workflows
) specify processing by an execution graph. This is useful because it opens the door to dependency checking and enables
They, however, do not blend in well with arbitrary Python code, as they must rely on their own execution engine.
Interfaces
give fine control of the execution of each step with a thin wrapper on the underlying software. As a result that can easily be inserted in Python code.
However, they force the user to specify explicit input and output file names and cannot do any caching.
This is why nipype exposes an intermediate mechanism, caching
that provides transparent output file management and caching within imperative Python code rather than a workflow.
from nipype.caching import Memory
mem = Memory(base_dir='.')
Note that the caching directory is a subdirectory called nipype_mem
of the given base_dir
. This is done to avoid polluting the base director.
In the corresponding execution context, nipype interfaces can be turned into callables that can be used as functions using the Memory.cache
method. For instance, if we want to run the fslMerge command on a set of files:
from nipype.interfaces import fsl
fsl_merge = mem.cache(fsl.Merge)
Note that the Memory.cache
method takes interfaces classes, and not instances.
The resulting fsl_merge
object can be applied as a function to parameters, that will form the inputs of the merge
fsl commands. Those inputs are given as keyword arguments, bearing the same name as the name in the inputs specs of the interface. In IPython, you can also get the argument list by using the fsl_merge?
syntax to inspect the docs:
In [3]: fsl_merge?
String Form:PipeFunc(nipype.interfaces.fsl.utils.Merge,
base_dir=/home/varoquau/dev/nipype/nipype/caching/nipype_mem)
Namespace: Interactive
File: /home/varoquau/dev/nipype/nipype/caching/memory.py
Definition: fsl_merge(self, **kwargs)
Docstring: Use fslmerge to concatenate images
Inputs
------
Mandatory:
dimension: dimension along which the file will be merged
in_files: None
Optional:
args: Additional parameters to the command
environ: Environment variables (default={})
ignore_exception: Print an error message instead of throwing an exception in case the interface fails to run (default=False)
merged_file: None
output_type: FSL output type
Outputs
-------
merged_file: None
Class Docstring:
...
Thus fsl_merge
is applied to parameters as such:
filepath = '/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz'
results = fsl_merge(dimension='t', in_files=[filepath, filepath])
The results are standard nipype nodes results. In particular, they expose an outputs
attribute that carries all the outputs of the process, as specified by the docs.
results.outputs.merged_file
Finally, and most important, if the node is applied to the same input parameters, it is not computed, and the results are reloaded from the disk:
results = fsl_merge(dimension='t', in_files=[filepath, filepath])
Once the Memory
is set up and you are applying it to data, an important thing to keep in mind is that you are using up disk cache. It might be useful to clean it using the methods that Memory
provides for this: Memory.clear_previous_runs
, Memory.clear_runs_since
.
A full-blown example showing how to stage multiple operations can be found in the caching_example.py
file.
The goal of the caching
module is to enable writing plain Python code rather than workflows. Use it: instead of data grabber nodes, use for instance the glob
module. To vary parameters, use for
loops. To make reusable code, write Python functions.
One good rule of thumb to respect is to avoid the usage of explicit filenames apart from the outermost inputs and outputs of your processing. The reason being that the caching mechanism of nipy.caching
takes care of generating the unique hashes, ensuring that, when you vary parameters, files are not overridden by the output of different computations.
Finally, the more you explore different parameters, the more you risk creating cached results that will never be reused. Keep in mind that it may be useful to flush the cache using Memory.clear_previous_runs
or Memory.clear_runs_since
.
For more info about the API, go to caching.memory
.