Back to the main page.
Bug 781 - Incorrect memavail reading peerslave command line executable
Status | CLOSED WONTFIX |
Reported | 2011-06-28 15:48:00 +0200 |
Modified | 2011-09-09 15:47:55 +0200 |
Product: | FieldTrip |
Component: | peer |
Version: | unspecified |
Hardware: | Other |
Operating System: | Linux |
Importance: | P1 normal |
Assigned to: | Robert Oostenveld |
URL: | |
Tags: | |
Depends on: | |
Blocks: | |
See also: |
Niels Kloosterman - 2011-06-28 15:48:31 +0200
Hi, When I use the following conf file for setting up multiple slaves with peerslave.glnxa64 the memavail is misrepresented: [peer] # allow jobs of up to 1 hour, mem 3GB matlab=/sara/sw/matlab/64/2011a -nodisplay -singleCompThread timavail=3600 memavail=3GB (and this times 7 in my conf file) When I run this in the terminal: niels@gb-r32n26:~/matlab$ ./fieldtrip/peer/bin/peerslave.glnxa64 peerslave_conf peerslave[18867]: peerinit: niels@gb-r32n26.irc.sara.nl, id = 497286917 and then do in matlab: >> peerlist there are 8 peers running in total (1 hosts, 1 users) there are 1 peers running on 1 hosts as master there are 7 peers running on 1 hosts as idle slave with 21 bytes memory available there are 0 peers running on 0 hosts as busy slave with 0 bytes and 0 seconds required there are 0 peers running on 0 hosts as zombie idle slave at niels@gb-r32n26.irc.sara.nl:1701, memavail = 3 bytes, timavail = 1.0 hours idle slave at niels@gb-r32n26.irc.sara.nl:1702, memavail = 3 bytes, timavail = 1.0 hours idle slave at niels@gb-r32n26.irc.sara.nl:1703, memavail = 3 bytes, timavail = 1.0 hours idle slave at niels@gb-r32n26.irc.sara.nl:1704, memavail = 3 bytes, timavail = 1.0 hours idle slave at niels@gb-r32n26.irc.sara.nl:1705, memavail = 3 bytes, timavail = 1.0 hours idle slave at niels@gb-r32n26.irc.sara.nl:1706, memavail = 3 bytes, timavail = 1.0 hours idle slave at niels@gb-r32n26.irc.sara.nl:1707, memavail = 3 bytes, timavail = 1.0 hours master at niels@gb-r32n26.irc.sara.nl:1708 There is only 3 bytes listed per slave. So it seems the GB from the conf file is not treated correctly. When I try by setting memavail in bytes (3000000000= 2.8 GB) and try to run peercellfun, the program does nothing. when I abort the script, the script hangs at ??? Operation terminated by user during ==> peerfeval at 207 line 207 of peerfeval is this: if isempty(jobid) % the job was not submitted succesfully and another attempt is needed % give the peer network some time to recover pause(sleep); continue; end When I manually open 7 peerslaves in separate matlab sessions this problem does not occur. Let me know if you need my scripts to solve these problems! Best, Niels
Niels Kloosterman - 2011-06-28 17:17:08 +0200
Also, if I run 7 peerslaves manually (starting 7 matlabs and typing peerslave in each of them) and run peercellfun with a number of jobs, the following error sometimes occurs: ??? Error using ==> peerfeval at 161 there are no slave peers available that meet the memory requirements Error in ==> peercellfun at 248 [curjobid curputtime] = peerfeval(fname, argin{:}, 'timeout', 5, 'memreq', memreq, 'timreq', timreq, 'diary', diary); Error in ==> MIBexp_peer at 71 peercellfun(@MIBexp_TFR_analysis, input); If I look how much memory ft wants to reserve for the job I get this: K>> memreq memreq = 1.9517e+17 So it seems also here something is wrong with the units used, bytes, gigabytes, ... Until now I work around this problem by doing for every matlab slave: peerslave('memavail', 1.9517e+17) but this is of course not nice.
Robert Oostenveld - 2011-06-28 17:43:57 +0200
thanks for your report. The problem is known as well on the ESI-Frankfurt cluster (see CC) and was reported once upon a time from a London linux computer. However, on the DCCN-Nijmegen cluster it never happens. The problem can be tracked down to peer/private/memprofile, which reads /proc/self/statm to determine each second how much memory MATLAB uses. It compiles a list of memory use oevr time, and upon completion of peerewxec.m the memory used (i.e. max-min) value is returned. In peercellfun upon submission of the next job, the memory use from the previous jobs is used to pick a suitable idle slave. What are the details of your operating system?
Niels Kloosterman - 2011-06-28 17:53:22 +0200
(In reply to comment #2) > thanks for your report. The problem is known as well on the ESI-Frankfurt > cluster (see CC) and was reported once upon a time from a London linux > computer. However, on the DCCN-Nijmegen cluster it never happens. > > The problem can be tracked down to peer/private/memprofile, which reads > /proc/self/statm to determine each second how much memory MATLAB uses. It > compiles a list of memory use oevr time, and upon completion of peerewxec.m the > memory used (i.e. max-min) value is returned. > > In peercellfun upon submission of the next job, the memory use from the > previous jobs is used to pick a suitable idle slave. > > What are the details of your operating system? Hey Robert, I am working on the LISA cluster within the SARA network Amsterdam. See for a description https://subtrac.sara.nl/userdoc/wiki/lisa64/description The OS is Debian Squeeze: https://subtrac.sara.nl/l/lisa/squeeze-2011-02-24 Let me know if you need more specific information. (Is there a unix command which nicely lists this?) Niels
Craig Richter - 2011-06-29 23:32:19 +0200
Interesting. Seems to be a Debian issue? On our system when I specify the memory for the peerslave it is correct, but I specify it in the conf file in bytes, whereas you specified it in GB and ended up with kilobytes. Your work around could be helpful to me. I had thought of trying that but worried it would get us into more trouble. Robert, how do you think we should proceed? Best, Craig
Niels Kloosterman - 2011-07-06 10:20:01 +0200
Just specifying a huge memavail (peerslave('memavail', 3e18)) works for me, because I know from experience that my TFR analysis jobs never take more than 3GB, so when I start up 7 slaves I stay well under my 24GB RAM. So far it hasn't caused any problems. Still, for future (more memory-intensive) analyses it would be nice if this feature works... Best, Niels (In reply to comment #4) > Interesting. Seems to be a Debian issue? On our system when I specify the > memory for the peerslave it is correct, but I specify it in the conf file in > bytes, whereas you specified it in GB and ended up with kilobytes. Your work > around could be helpful to me. I had thought of trying that but worried it > would get us into more trouble. Robert, how do you think we should proceed? > > Best, > > Craig
Robert Oostenveld - 2011-07-06 15:59:07 +0200
Thanks for suggesting this workaround! The bug remains open and will be fixed (when I can find the time for it).
Craig Richter - 2011-07-06 16:03:10 +0200
Sounds good. We have a new technical person here who is interested in dealing with the bug as well. He is looking at it, so if he finds a solution this could save you some time. He will be registering on bugzilla. I will add him to the CC list when he has his account.
Robert Oostenveld - 2011-07-12 14:09:37 +0200
On 8 Jul 2011, at 16:06, Roennburg, Kai wrote: Hi Robert, we had some time to dive into the whole issue with the peerslaves not running on the new ESI nodes on Debian Squeeze. We found several issues which are caused by the operation system and some by the code: 1) With the Debian Squeeze release the error handling was changed from Syslogd to ksyslogd, as a result we got a seg. fault when we compiled the code form the line: syslog_level = atol(pconf->verbose); The quick and dirty workaround was to hardcode the log level into the code. 2) With the kernel 2.6 there was a change in the way statm is working, as a result the output gives bogus values. You already had the right idea, but it was not that obvious. We found a nice link for that issue: http://search.cpan.org/dist/mod_perl/docs/api/Apache2/SizeLimit.pod Suggested ways to handle the issue are: 1) parse the 'VmSize' and 'VmRSS' variables from /proc/self/status or 2) parse the first lines of /proc/self/smaps with the variables 'Size' and 'Rss', We did the second and got at least for small jobs correct values back, but it looks like we found another problem which might be code related. 3) When we started bigger jobs requiring several GB of Ram we found the mem-requirements to change from correct to wrong values. We guess it’s a result of the getmem() from util.c, which needs to be changed to something like 'unsigned long' or 'long long' to be able to handle the bigger numbers (sometimes 16 Bit are not enough ;-) As of now it looks like the variable is consequently been overwritten and thus give non-sense values. We are currently working on issue number 3, but I guess you and your team might be faster and you have much more expertise in fixing it. Best Kai
Robert Oostenveld - 2011-07-12 14:10:55 +0200
(In reply to comment #8) > 1) With the Debian Squeeze release the error handling was changed from Syslogd > to ksyslogd, as a result we > got a seg. fault when we compiled the code form the line: > syslog_level = atol(pconf->verbose); > The quick and dirty workaround was to hardcode the log level into the code. I suspect that this might have been due to a bug in the code, which caused pconf->verbose to be accessed while actually unassigned (i.e. still NULL). That was fixed last week or so.
Robert Oostenveld - 2011-07-12 14:16:42 +0200
(In reply to comment #8) > 2) With the kernel 2.6 there was a change in the way statm is working, as a > result the output gives bogus values. ... > Suggested ways to handle the issue are: > 1) parse the 'VmSize' and 'VmRSS' variables from /proc/self/status or > 2) parse the first lines of /proc/self/smaps with the variables 'Size' and > 'Rss', /proc/self/smaps looks very complex on the quad-CPU 48-core AMD Opteron machine on which I just checked. /proc/self/status looks simple, I suggest to go for that. If you have a diff/patch or just an updated util.c, please send it to me and I'll integrate it in the svn trunk.
Robert Oostenveld - 2011-07-12 14:22:29 +0200
> 3) When we started bigger jobs requiring several GB of Ram we found the > mem-requirements to change from > correct to wrong values. We guess it’s a result of the getmem() from util.c, > which needs to be changed to something > like 'unsigned long' or 'long long' to be able to handle the bigger numbers > (sometimes 16 Bit are not enough ;-) Indeed, the numbers can be large. I have just changed the code for getmem in util.c from fscanf(fp, "%u%u"...) into fscanf(fp, "%llu%llu"...). At this moment I don't have much time to work on it in full detail and hope that my comments help you further in solving it.
Robert Oostenveld - 2011-07-12 14:23:36 +0200
On 8 Jul 2011, at 16:06, Roennburg, Kai wrote: Hi Robert, we had some time to dive into the whole issue with the peerslaves not running on the new ESI nodes on Debian Squeeze. We found several issues which are caused by the operation system and some by the code: 1) With the Debian Squeeze release the error handling was changed from Syslogd to ksyslogd, as a result we got a seg. fault when we compiled the code form the line: syslog_level = atol(pconf->verbose); The quick and dirty workaround was to hardcode the log level into the code. 2) With the kernel 2.6 there was a change in the way statm is working, as a result the output gives bogus values. You already had the right idea, but it was not that obvious. We found a nice link for that issue: http://search.cpan.org/dist/mod_perl/docs/api/Apache2/SizeLimit.pod Suggested ways to handle the issue are: 1) parse the 'VmSize' and 'VmRSS' variables from /proc/self/status or 2) parse the first lines of /proc/self/smaps with the variables 'Size' and 'Rss', We did the second and got at least for small jobs correct values back, but it looks like we found another problem which might be code related. 3) When we started bigger jobs requiring several GB of Ram we found the mem-requirements to change from correct to wrong values. We guess it’s a result of the getmem() from util.c, which needs to be changed to something like 'unsigned long' or 'long long' to be able to handle the bigger numbers (sometimes 16 Bit are not enough ;-) As of now it looks like the variable is consequently been overwritten and thus give non-sense values. We are currently working on issue number 3, but I guess you and your team might be faster and you have much more expertise in fixing it. Best Kai
Robert Oostenveld - 2011-07-12 14:28:00 +0200
(In reply to comment #12) please ignore comment #12, it is just an accidental repost of comment #8
Robert Oostenveld - 2011-08-31 17:27:41 +0200
I am closing this bug because the development on the fieldtrip/peer toolbox will be put onto hold in favor of the fieldtrip/qsub toolbox. The qsub toolbox is more promising for the DCCN as a whole and hence requires attention. The peer toolbox will remain available within fieldtrip, and external contributions to the code will be considered for inclusion. In the future, the development on fieldtrip/peer may be started up again and the bugs that I hereby close as "wontfix" can be revisited.