problem in running a job in NWCHEM


Click here for full thread
Clicked A Few Times
Found it
In src/tools/ga-5-4/gaf2c/gaf2c.c, the initialization code that sets up argc and argv fails to respect the rule that argv[argc] is NULL. OpenMPI relies on this, at least one of its functions ignores argc and looks for the terminating NULL pointer.

Since the memory is newly allocated, it is possible that some platforms get lucky and return memory that is pre-zeroed. But in general it isn't. Here is the fix (against version 6.6, which still has the bug):

void ga_f2c_get_cmd_args(int *argc, char ***argv)
{
    Integer i=0;
    int iargc=F2C_IARGC();
    char **iargv=NULL;

    if (iargc >= F2C_GETARG_ARGV_MAX) {    /* CHANGED > TO >= TO MAKE SURE INDEX iargc IS ALSO VALID */
        printf("ga_f2c_get_cmd_args: too many cmd line args");
        armci_msg_abort(1);
    }
    iargv = (char**)malloc(sizeof(char*)*F2C_GETARG_ARGV_MAX);
    if (!iargv) {
        printf("ga_f2c_get_cmd_args: malloc iargv failed");
        armci_msg_abort(1);
    }
    for (i=0; i<iargc; i++) {
        char fstring[F2C_GETARG_ARGLEN_MAX];
        char cstring[F2C_GETARG_ARGLEN_MAX];
        F2C_GETARG(&i, fstring, F2C_GETARG_ARGLEN_MAX);
        ga_f2cstring(fstring, F2C_GETARG_ARGLEN_MAX,
                cstring, F2C_GETARG_ARGLEN_MAX);
        iargv[i] = strdup(cstring);
    }
    iargv[iargc] = 0;  /* ADDED THIS LINE */
    *argc = iargc;
    *argv = iargv;
}


There's a similar bug in _PBEGINF_() in file src/tools/ga-5-4/tcgmsg/fapi.c. Extra space is allocated for argv[argc], but it is never initialized to zero.