Back to the main page.

Bug 384 - unpredictable crashese during peer computing

Reported 2011-01-12 09:02:00 +0100
Modified 2011-09-09 15:47:53 +0200
Product: FieldTrip
Component: peer
Version: unspecified
Hardware: PC
Operating System: Mac OS
Importance: P1 normal
Assigned to: Robert Oostenveld
Depends on:
See also:

Jan-Mathijs Schoffelen - 2011-01-12 09:02:15 +0100

get 'failure to execute the job (argout)' at unpredictable moments after having submitted a batch of jobs to the peer network. Each job executes the same function, operating on a different file (leading to variable memory requirement). Running the individual (crashed) job locally does not lead to problem. The only thing I can imagine that is happening, is that the machine at which the slave is running goes out of memory, causing the job to crash. is this to be alleviated with setting minmemreq?

Robert Oostenveld - 2011-01-25 23:51:15 +0100

changed multiple bugs to ASSIGNED to roboos

Marcel Zwiers - 2011-08-02 10:48:03 +0200

It is not unpredictable at all but it is caused by the watchdog killing your job (but not appropriately reporting its intervention back to the master). Try this at home: peercellfun(@pause,{10}, 'TimReq',1) submitted 1/1, collected 0/1, busy 1, speedup 0.0 submitted 1/1, collected 0/1, busy 0, speedup 0.0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % an error was detected, the diary output of the remote execution follows %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% an error was detected during the execution of job 1 ??? Error using ==> peerget at 135 failed to execute the job (eval) Error in ==> peerget at 135 error(err); Error in ==> peercellfun at 330 [argout, options] = peerget(joblist(i).jobid, 'timeout', inf, 'output', 'cell', 'diary', diary, 'StopOnError', StopOnError);

Robert Oostenveld - 2011-08-31 17:27:40 +0200

I am closing this bug because the development on the fieldtrip/peer toolbox will be put onto hold in favor of the fieldtrip/qsub toolbox. The qsub toolbox is more promising for the DCCN as a whole and hence requires attention. The peer toolbox will remain available within fieldtrip, and external contributions to the code will be considered for inclusion. In the future, the development on fieldtrip/peer may be started up again and the bugs that I hereby close as "wontfix" can be revisited.

Robert Oostenveld - 2011-09-09 15:47:53 +0200

closed all of my bugs that were resolved