Cuda compiling


Click here for full thread
Clicked A Few Times
Short results (Maybe it is help)
1)touch `egrep -l TCE_CUDA *`
 make
......
ccsd_t_gpu.F:2: note: get vectype with 2 units of type integer(kind=8)
ccsd_t_gpu.F:2: note: vectype: vector(2) integer(kind=8)
ccsd_t_gpu.F:2: note: not vectorized: no vectype for stmt: energy_l ={v} {CLOBBER};
scalar_type: real(kind=8)[2]
ccsd_t_gpu.F:2: note: === vect_pattern_recog === ccsd_t_gpu.F:2: note: === vect_analyze_dependences === ccsd_t_gpu.F:2: note: === vect_analyze_data_refs_alignment === ccsd_t_gpu.F:2: note: vect_compute_data_ref_alignment:
ccsd_t_gpu.F:2: note: misalign = 0 bytes of ref cuda_device_number
ccsd_t_gpu.F:2: note: === vect_analyze_data_ref_accesses === ccsd_t_gpu.F:2: note: not consecutive access cuda_device_number ={v} {CLOBBER};

 ccsd_t_gpu.F:2: note: === vect_analyze_slp ===
 ccsd_t_gpu.F:2: note: Failed to SLP the basic block.
ccsd_t_gpu.F:2: note: not vectorized: failed to find SLP opportunities in basic block.
make[1]: Leaving directory `/MD/azat/nwchem-6.6/src/tce/ccsd_t'
make[1]: Entering directory `/MD/azat/nwchem-6.6/src/tce/ccsd_t'
Got lock on /MD/azat/nwchem-6.6/lib/LINUX64/libtce.lock
ar r /MD/azat/nwchem-6.6/lib/LINUX64/libtce.a ccsd_t_gpu.o
echo /MD/azat/nwchem-6.6/lib/LINUX64/libtce.a
/MD/azat/nwchem-6.6/lib/LINUX64/libtce.a
make[1]: Leaving directory `/MD/azat/nwchem-6.6/src/tce/ccsd_t'

2) cd ..
  linux432:/MD/azat/nwchem-6.6/src/tce # touch `egrep -l TCE_CUDA *`

  ...
egrep: gradients: Is a directory
egrep: include: Is a directory
egrep: ipccsd: Is a directory
egrep: lccd: Is a directory
egrep: lccsd: Is a directory
egrep: lrh: Is a directory
egrep: mbpt1: Is a directory
egrep: mbpt2: Is a directory
egrep: mbpt3: Is a directory
egrep: mbpt4: Is a directory
egrep: mrcc: Is a directory
egrep: qcisd: Is a directory
egrep: response: Is a directory
egrep: sort: Is a directory

3) linux432:/MD/azat/nwchem-6.6/src/tce # make

 ...
Making all in sort
make[1]: Entering directory `/MD/azat/nwchem-6.6/src/tce/sort'
make[2]: Entering directory `/MD/azat/nwchem-6.6/src/tce/sort'
make[2]: `/MD/azat/nwchem-6.6/lib/LINUX64/libtce.a' is up to date.
make[2]: Leaving directory `/MD/azat/nwchem-6.6/src/tce/sort'
make[2]: Entering directory `/MD/azat/nwchem-6.6/src/tce/sort'
make[2]: Leaving directory `/MD/azat/nwchem-6.6/src/tce/sort'
make[1]: Leaving directory `/MD/azat/nwchem-6.6/src/tce/sort'
make[1]: Entering directory `/MD/azat/nwchem-6.6/src/tce'
Got lock on /MD/azat/nwchem-6.6/lib/LINUX64/libtce.lock
ar r /MD/azat/nwchem-6.6/lib/LINUX64/libtce.a tce_input.o tce_energy.o
echo /MD/azat/nwchem-6.6/lib/LINUX64/libtce.a
/MD/azat/nwchem-6.6/lib/LINUX64/libtce.a
make[1]: Leaving directory `/MD/azat/nwchem-6.6/src/tce'
4) linux432:/MD/azat/nwchem-6.6/src # make link
 make nwchem.o stubs.o
make[1]: Entering directory `/MD/azat/nwchem-6.6/src'
gfortran -m64 -ffast-math -Warray-bounds -fdefault-integer-8 -march=native -mtune=native -finline-functions -O2 -g -fno-aggressive-loop-optimizations -g -O -I. -I/MD/azat/nwchem-6.6/src/include -I/MD/azat/nwchem-6.6/src/tools/install/include -DGFORTRAN -DCHKUNDFLW -DGCC4 -DGCC46 -DEXT_INT -DLINUX -DLINUX64 -DPARALLEL_DIAG -DCOMPILATION_DATE="'`date +%a_%b_%d_%H:%M:%S_%Y`'" -DCOMPILATION_DIR="'/MD/azat/nwchem-6.6'" -DNWCHEM_BRANCH="'6.6'" -c -o nwchem.o nwchem.F
gfortran -m64 -ffast-math -Warray-bounds -fdefault-integer-8 -march=native -mtune=native -finline-functions -O2 -g -fno-aggressive-loop-optimizations -g -O -I. -I/MD/azat/nwchem-6.6/src/include -I/MD/azat/nwchem-6.6/src/tools/install/include -DGFORTRAN -DCHKUNDFLW -DGCC4 -DGCC46 -DEXT_INT -DLINUX -DLINUX64 -DPARALLEL_DIAG -DCOMPILATION_DATE="'`date +%a_%b_%d_%H:%M:%S_%Y`'" -DCOMPILATION_DIR="'/MD/azat/nwchem-6.6'" -DNWCHEM_BRANCH="'6.6'" -c -o stubs.o stubs.F
make[1]: Leaving directory `/MD/azat/nwchem-6.6/src'
gfortran -L/MD/azat/nwchem-6.6/lib/LINUX64 -L/MD/azat/nwchem-6.6/src/tools/install/lib -o /MD/azat/nwchem-6.6/bin/LINUX64/nwchem nwchem.o stubs.o -lnwctask -lccsd -lmcscf -lselci -lmp2 -lmoints -lstepper -ldriver -loptim -lnwdft -lgradients -lcphf -lesp -lddscf -ldangchang -lguess -lhessian -lvib -lnwcutil -lrimp2 -lproperty -lsolvation -lnwints -lprepar -lnwmd -lnwpw -lofpw -lpaw -lpspw -lband -lnwpwlib -lcafe -lspace -lanalyze -lqhop -lpfft -ldplot -ldrdy -lvscf -lqmmm -lqmd -letrans -lpspw -ltce -lbq -lmm -lcons -lperfm -ldntmc -lccca -lnwcutil -lga -larmci -lpeigs -lperfm -lcons -lbq -lnwcutil -l64to32 -L/MD/azat/Openblas/lib -lopenblas -lpthread -lrt -lnwclapack -lnwcblas -L/usr/lib64/mpi/gcc/openmpi/lib64 -lmpi_usempi -lmpi_mpifh -lmpi -L/usr/local/cuda-6.5/lib64 -L/usr/local/cuda-6.5/lib -lcudart -lstdc++ -lrt -lm -lpthread

Test result again
Iterations converged
CCSD correlation energy / hartree =        -0.213269954064232
CCSD total energy / hartree = -76.240077811241036

Singles contributions

Doubles contributions
CCSD(T)
Using plain CCSD(T) code
total no. of tasks 230
total no. of tasks / no. procs 230
wl_min 12 1.5
wl_max 13824 4.9
thresh for no. of tasks 230

CCSD[T]  correction energy / hartree =        -0.003139909173626
CCSD[T] correlation energy / hartree = -0.216409863237859
CCSD[T] total energy / hartree = -76.243217720414663
CCSD(T) correction energy / hartree = -0.003054718622066
CCSD(T) correlation energy / hartree = -0.216324672686299
CCSD(T) total energy / hartree = -76.243132529863104
Cpu & wall time / sec 0.1 0.1

Parallel integral file used       1 records with       0 large values


Task  times  cpu:        5.4s     wall:        7.1s