5:47:25 PM PDT - Tue, May 29th 2012 |
|
Hi,
I've been using ECCE to manage nwchem and gaussian jobs on a couple of clusters and it's been a good experience. However, I recently got access to a new university cluster using SGE and found that I was being locked out quite frequently. Turns out that the number of concurrent processes allowed on the head node is limited to 32.
Just make myself clear: obviously, the true, underlying problem is an overly strict ulimit -- and this isn't something I'm faulting ECCE for.
Looking at what happens when ECCE submits a job, five processes are created and put to sleep on the head node:
sshd: user [priv]
sshd: user@notty
bash -c echo +hi+ && csh -i
-sh -i
perl eccejobmonitor -configFile eccejobmonitor.conf -jobId X -bookmark Y
I don't have a computing background and so may be a bit naïve, but here are my two questions:
Are all these processes necessary? If they aren't, is there an easy 'fix' by editing e.g. a submit script?
If all these procs aren't required, taking into account the (overly strict) limits that are imposed on the number of user-associated procs on some computational clusters may be something worth doing in future versions of ECCE. One head node proc per launched job would be ideal.
Again, thank you for all the great work!
EDIT: I'm using ecce v6.3.
|
|