Solved: ECCE using non-standard ssh ports -- port redirection.


Click here for full thread
Gets Around
Since you are experiencing what potentially looks like several remote communications issues, I think you would benefit greatly from enabling the logging of all the underlying commands that ECCE issues to launch a job. This is pretty much the first step whenever something isn't working right related to remote communications in ECCE. If you look at the $ECCE_HOME/siteconfig/site_runtime file you'll see all the variables that ECCE allows for customizing behavior including some for debugging like logging remote communication. There is also documentation for how to use these variables. So editing the site_runtime file is one way to enable this logging.

In this case though it's probably easier just to manually set the variable needed and then run ECCE rather than changing the site_runtime file. The environment variable you want to set is $ECCE_RCOM_LOGMODE and you'll want to set the value to "true". In csh it would be "setenv ECCE_RCOM_LOGMODE true" and for sh/bash it would be "export ECCE_RCOM_LOGMODE=true". Then in the same shell you'll want to start ECCE so that it sees this new variable definition. Then whenever you do something in ECCE that requires remote communication, you'll see what ECCE does behind the scenes sent to the terminal window where ECCE was started, which can be a lot of data. If you can't scroll back to the start of the output you will want to start ecce inside a "script" session so that a file is created with all the remote communications output. Hopefully this will help in figuring out what is going wrong. I'd work on "the node" issue first since it seems like you are having more success there.

There is also a way to specify for the file transfer to use the existing ssh connection for file transfer instead of a separate scp comment, which may prove useful. If you look at the $ECCE_HOME/siteconfig/CONFIG.chinook file you'll see this "singleConnect" variable being set to the value of "check". In your case you'd want to edit your CONFIG.<machine> file and set the value of this variable to "true". This will remove the separate scp file transfer step. But, you may be able to fix this issue without resorting to that because the $ECCE_RCOM_LOGMODE setting should give you more information on what is happening now.

Gary