Back to the main page.
Bug 384 - unpredictable crashese during peer computing
Status | CLOSED WONTFIX |
Reported | 2011-01-12 09:02:00 +0100 |
Modified | 2011-09-09 15:47:53 +0200 |
Product: | FieldTrip |
Component: | peer |
Version: | unspecified |
Hardware: | PC |
Operating System: | Mac OS |
Importance: | P1 normal |
Assigned to: | Robert Oostenveld |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
Jan-Mathijs Schoffelen - 2011-01-12 09:02:15 +0100
get 'failure to execute the job (argout)' at unpredictable moments after having submitted a batch of jobs to the peer network. Each job executes the same function, operating on a different file (leading to variable memory requirement). Running the individual (crashed) job locally does not lead to problem. The only thing I can imagine that is happening, is that the machine at which the slave is running goes out of memory, causing the job to crash. is this to be alleviated with setting minmemreq?
Robert Oostenveld - 2011-01-25 23:51:15 +0100
changed multiple bugs to ASSIGNED to roboos
Marcel Zwiers - 2011-08-02 10:48:03 +0200
It is not unpredictable at all but it is caused by the watchdog killing your job (but not appropriately reporting its intervention back to the master). Try this at home: peercellfun(@pause,{10}, 'TimReq',1) submitted 1/1, collected 0/1, busy 1, speedup 0.0 submitted 1/1, collected 0/1, busy 0, speedup 0.0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % an error was detected, the diary output of the remote execution follows %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% an error was detected during the execution of job 1 ??? Error using ==> peerget at 135 failed to execute the job (eval) Error in ==> peerget at 135 error(err); Error in ==> peercellfun at 330 [argout, options] = peerget(joblist(i).jobid, 'timeout', inf, 'output', 'cell', 'diary', diary, 'StopOnError', StopOnError);
Robert Oostenveld - 2011-08-31 17:27:40 +0200
I am closing this bug because the development on the fieldtrip/peer toolbox will be put onto hold in favor of the fieldtrip/qsub toolbox. The qsub toolbox is more promising for the DCCN as a whole and hence requires attention. The peer toolbox will remain available within fieldtrip, and external contributions to the code will be considered for inclusion. In the future, the development on fieldtrip/peer may be started up again and the bugs that I hereby close as "wontfix" can be revisited.