Back to the main page.
Bug 2295 - peerlist receives no job-information, causing all jobs to be resubmitted after the set 60 seconds
Status | ASSIGNED |
Reported | 2013-09-20 18:05:00 +0200 |
Modified | 2013-09-23 11:13:01 +0200 |
Product: | FieldTrip |
Component: | peer |
Version: | unspecified |
Hardware: | PC |
Operating System: | Windows |
Importance: | P3 normal |
Assigned to: | Robert Oostenveld |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
Roemer van der Meij - 2013-09-20 18:05:27 +0200
I happen to come across this while using the p2p toolbox on our new DCC cluster. This is within machine. I start one master, and a set of slaves, having specific groups, groupallows and userallows. The jobs get submitted using peercellfun, and executed nicely. However, after the 60s delay at line 471, the status of the lastseen variable is an inf for all submitted jobs. The jobs are correctly seen as submitted (all ones). After closer inspection, a probably looks to be peerlist. When calling peerlist as list = peerlist('busy'), I get a mostly correct structure array for all running peers. Except that list(i).current (containing the job info), is largely 'empty'. An example: On some matlab terminals: peerslave('allowuser','roevdmei','allowgroup','arch2','memavail',4294967296','timavail',1209600) On my main matlab terminal peermaster('group','arch2','allowgroup','arch2','allowuser','roevdmei') **submiting some jobs, which get executed on the peers (and retrieved later on)** list = peerlist('busy') list(i) = hostid: 2.1561e+09 hostname: 'archimedes' user: 'roevdmei' group: 'arch2' socket: '' port: 1701 status: 3 timavail: 1209600 memavail: 4.2950e+09 cpuavail: 0 current: [1x1 struct] These are the correct settings I gave the peer. However, while the peer is actually being executed: list(i).current hostid: 0 jobid: 0 hostname: '' user: '' group: '' timreq: 0 memreq: 0 cpureq: 0 This likely leads to the master never knowing when the slaves are busy with it's jobs, and thus keeps on resubmitting them. When the originals finish, it correctly reverts to the original results and finishes/quits nicely. The resubmitted jobs keep the peers busy for much longer time after this though. Sorry being inactive/not-very-active the past weeks/months, I'm a bit swamped by theses work and been sick for a long time the past months. Will catch up very soon, hopefully next monday. Cheers, Roemer
Robert Oostenveld - 2013-09-20 19:20:12 +0200
oh jee, ik was eigenlijk van plan p2p als deprecated af te voeren...