Solved: ECCE using non-standard ssh ports -- port redirection.


Click here for full thread
Gets Around
Wow, you made a lot of good progress in a short time. From checking man pages for ssh and scp it looks like the rationale for not being consistent with -p/-P is that they preferred to be compliant with "cp" command usage of "-p" to preserve modification times of copied files. Then I would have (wrongly) guessed that they would have gone with uppercase -P for the ssh command and got back consistency on both fronts. Or, they could support either -p or -P for the port with ssh and not lost anything. Anyway, we have never had to specify the port with ssh/scp for a custom remote shell or we would have had the same problem you did. I think your approach is definitely the right one (separate the ssh and scp commands).

In regards to the "singleConnect" directive usage, you'd actually want to do "singleConnect: true" rather than "singleConnect: check". The latter is a special case where it sometimes does file transfer via ssh and sometimes does a separate scp command. To figure out which it "checks" (hence the cryptic value for the variable) if the machine you are going to from your ECCE client host is on the same domain or not. If it is on a different domain then it uses a shared/single ssh connection for file transfer. If it is the same domain then it does a separate command. So I'm guessing in your case it is on the same domain and therefore you would never see anything different than if you had never used the singleConnect directive. Setting the value to "true" will force it to share a single ssh connection and you should see this in the $ECCE_RCOM_LOGMODE output (the lack of an scp command being issued and instead there is a "dd" command used that echoes out the files to perform the transfer). The "check" value is useful in the case where users from outside the local domain are treated different from those inside and this is the case for the EMSL chinook cluster. Outside requires a "one-time login" credential" in addition to a password where inside only the password is needed. So, we really wanted to avoid prompting the user more than a single time for this one-time credential to do a job launch (even though technically it would work just fine--just seems a little strange and annoying which is something we strive to avoid with ECCE) and therefore came up with this strategy. We've also found for bigger files that doing the ssh/dd based file transfer isn't as reliable as an scp command or else we would just switch ECCE over to never do scp commands since then users would never have to be concerned with scp command syntax.

That's a lot of output you included for your remaining xterm issue (I know of course that's due to how verbose ECCE is when doing remote operations). However, I see one potential part of that output that I think could be related to your problem. Do you see the line like:

if (-x channel 1/xterm) echo TRUE

and then the next line says it is an "if expression" syntax error? That to me indicates it is bailing just before trying to invoke the xterm. The reason is that it doesn't think the xterm command exists and the reason for that is this failed "if" command. The question then becomes where this "channel 1" part of the path is coming from. Do you having anything in your CONFIG.<machine> file that
looks like that? Clearly the space between "channel" and "1" is not what would be expected for a valid expression and if we can figure that out, I bet we can get remote xterms and tail commands working for you.

Thanks for linking in your blog on making NWChem/ECCE work for you. I'm very impressed with your resourcefulness and the lengths you've gone to--you have persevered through the adversity where others would have bailed. I just scanned through most of it rather than picking up detail. One thing that I did notice is your issues with OpenGL where you suggested moving the shared libraries to another directory. While that's perfectly workable, this would be another instance where consulting the $ECCE_HOME/siteconfig/site_runtime file would be useful. There you would learn about the $ECCE_MESA_OPENGL and $ECCE_MESA_EXCEPT variables that control whether to use the ECCE-supplied GL libraries or native ones (e.g. hardware OpenGL card drivers) on your machine.

Gary