problem in running a job in NWCHEM


Clicked A Few Times
hello... i am new user of NWCHEM.. I have install the softwere using following steps:

export NWCHEM_TOP=/home/sat/nwchem/nwchem-6.1
export NWCHEM_TARGET=LINUX
export ARMCI_NETWORK=MPI-TS
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/home/sat/libs/mpich2/mpich2-install/
export MPI_LIB=/home/sat/libs/mpich2/mpich2-install/lib
export MPI_INCLUDE=/home/sat/libs/mpich2/mpich2-install/include
export LIBMPI="-lmpi_f90 -lmpi_f77 -lmpi -ldl -lhwloc" 
export NWCHEM_MODULES=all python
export LARGE_FILES=TRUE
export USE_NOFSCHECK=TRUE 
export MRCC_THEORY=TRUE
export PYTHONHOME=/usr/lib/python2.7
export PYTHONVERSION=2.7
export USE_PYTHON64=y 
export BLASOPT="-L/home/sat/libs/sca/scalapack-2.0.2/-libscalapack.a"
cd $NWCHEM_TOP/src
make nwchem_config
make clean
make CC=gcc FC=gfortran

Binary is formed... but when i try to run an example which is given in manual ie. geometry optimization of nitrogen,, it gives error..
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
  1. 0 0xB747F163
  2. 1 0xB747F800
  3. 2 0xB772A3FF
  4. 3 0xB75C6790
  5. 4 0xB75C8398
  6. 5 0xA3FC7AF in tcgi_alt_pbegin
  7. 6 0xA3FC821 in tcgi_pbegin
  8. 7 0xA3FE053 in pbeginf_
  9. 8 0x8106B5C in nwchem at nwchem.F:66
Segmentation fault (core dumped)

please help me to solve this problem...
thankyou in advance..

Forum Regular
Hi,

The code fails in tcgi_alt_pbegin which is the routine that initializes MPI. You will find that routine in src/tools/ga-5-1/armci/tcgmsg/tcgmsg-mpi/misc.c. Essentially this routine contains plain and straightforward MPI calls. It checks whether MPI is initialized, if not it initializes MPI, and it figures out how many processors you have and what the rank of the current processor is. I cannot see anything wrong with that. So the only tricky point is in argc and argv. The C-code must have these command line arguments, but a normal Fortran program does not provide those. Therefore a special function needs to be called to pick these command line arguments up. My guess is that something went wrong there and that argc or argv are actually invalud.

To figure this out you can stick the following code in tcgi_alt_pbegin:

     int i;
for (i=0; i<*argc; i++) {
printf("argument %d = %s\n",i,*argv[i]);
fflush(NULL);
}

If I haven't made any typos then this should give you the command line arguments (the first argument should be the name of the executable). If the code didn't pick the command line arguments up correctly the code should seg-fault in one of the print statements. Could you give this a try please?

Huub

Clicked A Few Times
Found it
In src/tools/ga-5-4/gaf2c/gaf2c.c, the initialization code that sets up argc and argv fails to respect the rule that argv[argc] is NULL. OpenMPI relies on this, at least one of its functions ignores argc and looks for the terminating NULL pointer.

Since the memory is newly allocated, it is possible that some platforms get lucky and return memory that is pre-zeroed. But in general it isn't. Here is the fix (against version 6.6, which still has the bug):

void ga_f2c_get_cmd_args(int *argc, char ***argv)
{
    Integer i=0;
    int iargc=F2C_IARGC();
    char **iargv=NULL;

    if (iargc >= F2C_GETARG_ARGV_MAX) {    /* CHANGED > TO >= TO MAKE SURE INDEX iargc IS ALSO VALID */
        printf("ga_f2c_get_cmd_args: too many cmd line args");
        armci_msg_abort(1);
    }
    iargv = (char**)malloc(sizeof(char*)*F2C_GETARG_ARGV_MAX);
    if (!iargv) {
        printf("ga_f2c_get_cmd_args: malloc iargv failed");
        armci_msg_abort(1);
    }
    for (i=0; i<iargc; i++) {
        char fstring[F2C_GETARG_ARGLEN_MAX];
        char cstring[F2C_GETARG_ARGLEN_MAX];
        F2C_GETARG(&i, fstring, F2C_GETARG_ARGLEN_MAX);
        ga_f2cstring(fstring, F2C_GETARG_ARGLEN_MAX,
                cstring, F2C_GETARG_ARGLEN_MAX);
        iargv[i] = strdup(cstring);
    }
    iargv[iargc] = 0;  /* ADDED THIS LINE */
    *argc = iargc;
    *argv = iargv;
}


There's a similar bug in _PBEGINF_() in file src/tools/ga-5-4/tcgmsg/fapi.c. Extra space is allocated for argv[argc], but it is never initialized to zero.


Forum >> NWChem's corner >> General Topics