Throughout Nipype we try to provide meaningful error messages. If you run into an error that does not have a meaningful error message please let us know so that we can improve error reporting.
Here are some notes that may help to debug workflows or understanding performance issues.
Always run your workflow first on a single iterable (e.g. subject) and gradually increase the execution distribution complexity (Linear->MultiProc-> SGE).
Use the debug config mode. This can be done by setting:
from nipype import config config.enable_debug_mode()
as the first import of your nipype script.
utilsloggers will all be set to level
There are several configuration options that can help with debugging. See Configuration File for more details:
keep_inputs remove_unnecessary_outputs stop_on_first_crash stop_on_first_rerun
When running in distributed mode on cluster engines, it is possible for a
node to fail without generating a crash file in the crashdump directory. In
such cases, it will store a crash file in the
All Nipype crashfiles can be inspected with the
nipypecli search command allows you to search for regular expressions
in the tracebacks of the Nipype crashfiles within a log folder.
Nipype determines the hash of the input state of a node. If any input contains strings that represent files on the system path, the hash evaluation mechanism will determine the timestamp or content hash of each of those files. Thus any node with an input containing huge dictionaries (or lists) of file names can cause serious performance penalties.
For HUGE data processing,
stop_on_first_crash: False, is needed to get the
bulk of processing done, and then
stop_on_first_crash: True, is needed for
debugging and finding failing cases. Setting
is a reasonable option when you would expect 90% of the data to execute
Sometimes nipype will hang as if nothing is going on and if you hit
you will get a
ConcurrentLogHandler error. Simply remove the pypeline.lock
file in your home directory and continue.
On many clusters with shared NFS mounts synchronization of files across
clusters may not happen before the typical NFS cache timeouts. When using
PBS/LSF/SGE/Condor plugins in such cases the workflow may crash because it
cannot retrieve the node result. Setting the
job_finished_timeout can help:
workflow.config['execution']['job_finished_timeout'] = 65