University of Connecticut University of UC Title Fallback Connecticut

Compilation, Execution and IMSL

Compilation (while using IMSL)

The cluster is equipped with IMSL (International Mathematics and Statistics Library) for C/C++ and Fortran.  Note that due to license restrictions, only the Statistics Cluster nodes (labeled stat0 through stat31 as opposed to physics and geophysics computers) have access to these libraries. The libraries are filed away under /usr/local/vni/imsl according to language, version (where applicable) and architecture.

C/C++ libraries are labeled with the prefix CNL, those of Fortran are labeled FNL. Under normal setup, all library paths are recorded in your environment variables of the form LINK_CNL… or LINK_FNL… Likewise convenient compiler and compilation arguments are stored in environment variables on stat31 – your login and compilation node. Thus, compilation of a fortran program may look as simple as:

$F90 srcname.f $F90FLAGS $LINK_FNL -o binname

Fortran compiler flags are also available under $FFLAGS (referring to “fixed-format” source files.) Corresponding flags for C/C++ compilation are $CC and $CFLAGS. You can review all the available environment with the set command (setenv for c-shell relatives). This long list can be filtered as follows:

set |grep CNL

(Replace set for setenv for c-shell and CNL for FNL or any other filter string as necessary.) As always, custom environment variables may be defined and made a permanent part of the login shell by adjusting the “rc” file appropriate the login shell used.

Detailed Examples – Monte Carlo Calculation of π in Fortran, C and R

Consider the problem of a Monte Carlo calculation of π; incorporating IMSL’s random number generator or runif() in R.  The following program implements random sampling of points within a square bounding a circle.  The probability of landing inside the circle can be shown to be π/4.

  • Fortran example
  • C example
  • R example


Example with Fortran and IMSL:

! Request use of IMSL rand_gen_int library

use rand_gen_int

implicit none

integer i

integer, parameter :: n=5000000

real(kind(1e0)), parameter :: one=1e0, zero=0e0

real(kind(1e0)) x(n),y(n),count,pi

! Obtain random x,y coordinates in range [0,1]

call rand_gen(x)

call rand_gen(y)

! Count those that fell within a radius of 1

count=0

do i=1,n

if (x(i)*x(i)+y(i)*y(i) < 1.0) count=count+1

end do

pi=4.0*count/(n+1.0)

write (*,‘(F15.10)’) pi

end

Compilation and Execution

Saving this program as calcpi_imsl.f90 and compiling with the
following command (based on standard environment variables configured on the cluster)


$F90 calcpi_imsl.f90 $F90FLAGS $LINK_FNL -o calcpi_imsl

where, pgf90 stands in for $F90 and, depending on the version of libraries, the the flags may look like:


$F90FLAGS:   -w -mp -tp k8-64 -Kieee -module /usr/local/vni/imsl/fnl600/rdhpg715x64/include

$LINK_FNL:   -R/usr/local/vni/imsl/fnl600/rdhpg715x64/lib -L/usr/local/vni/imsl/fnl600/rdhpg715x64/lib -Bdynamic -limsl -limslsuperlu -limslscalar -limslblas -ldl -R/usr/local/pgi/linux86-64/7.1/libso

Of course, the flags may be taylored to the actual dependencies of the code, but having such standard sets of flags saved in the environment variables is convenient. The compiled binary can now be executed: ./calcpi_imsl Independent runs should yield values like:


3.1415688992

3.1418640614

3.1421930790

3.1415233612

Submission to the Cluster

To prepare the job execution space and inform Condor of the appropriate run environment,
create a job description file (e.g. calcpi_imsl.condor)


Executable = calcpi_imsl

Requirements = ParallelSchedulingGroup == "stats group"

Universe = vanilla

output = calcpi$(Process).out

error = calcpi$(Process).err

Log = calcpi.log

should_transfer_files = YES

when_to_transfer_output = ON_EXIT

Queue 10

Most of the fields are self-explanatory. The exectuable file is pointed to in the first line.
Next goes the requirement that the job stay on the Statistics Cluster (i.e. select
machines with a ClassAds label “stats group”.) This is especially important
in this case, as IMSL is only licensed for use on this cluster.
“Queue” specifies how many instances to run on the cluster.
Fields like output, error and log specify the naming conventions for the various information
files the job will generate. In their respective order, these receive the standard output,
standard error (the two streams you would ordinarily receive on the screen when using
the shell interactively) and the Condor log for this job. The latter is instructive in case
of any problems scheduling your job instances on the cluster. Notice the use of a
place holder $(Process. This will be replaced by a unique process number
within your submitted job and will allow you to track each job instance separately.
Also, without this place holder, information streams from the job instances would be
overwriting the same files.

The universe variable specifies the condor runtime environment.
For the purposes of these independent jobs, the simplest “vanilla” universe suffices.
In a more complicated parallel task, with checkpointing and migration, MPI calls etc.,
more advanced run-time environments are employed, often requiring specilized
linking of the binaries.
The lines specifying transfer settings are important to avoid any assumptions about accessibility
over nfs. They should be included whether or not any output files
(aside from standard output and error) are necessary.

Job Submission and Management

The job is submitted with the command:


condor_submit calcpi_imsl.condor

The cluster can be queried before or after submission to check its availability.
Two very versatile commands exist for this purpose:
condor_status and condor_q.
The former returns the status of the nodes
(broken down by virtual machines or “slots” that can each handle a job instance.)
The latter command shows the job queue including the individual instances of every job
and the submission status (e.g. idling, busy etc.)

Using condor_q a few seconds after submission shows:

-- Submitter: stat31.phys.uconn.edu :  : stat31.phys.uconn.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  33.3   prod            1/30 15:37   0+00:00:02 R  0   9.8  calcpi_imsl
  33.4   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi_imsl 
  33.5   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi_imsl 
  33.6   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi_imsl 
  33.7   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi_imsl 
  33.8   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi_imsl 

6 jobs; 0 idle, 6 running, 0 held

This implies (and can be varified via the output files) that 4 jobs have finished
by this point and the remainder have the status ‘R’ – running. The job ID number (first column)
is a handle on the job and the individual job instances witht the format:
[job number].[process number]. The process number is the same as that referenced by the
$(Process) variable.
For instance, to remove the
instance listed in the last line, one can issue a command:


condor_rm 33.8

To remove the entire job, the handle would be just 33.


Example with C

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc,
char *argv[])
{

int i,N,incirc=0;
double x,y,circrad2;

sscanf(argv[1], “%d”, &N); // get iteration number from input
srand(time(NULL));         // seed random number generator

circrad2=1.0*RAND_MAX;
circrad2*=circrad2;        // Define radius squared

for(i=0;i<N;i++){

x=1.0*rand(); y=1.0*rand();     // get rand. point and
incirc += (x*x+y*y) < circrad2; // check if inside circle

}

printf(“pi=%.12f\n”,4.0*incirc/N); // display probability
return 0;

}

Compiling this program (that we may save as calcpi.c)


gcc calcpi.c -o calcpi

yields an executable calcpi that is ready for submission.

Preparation for Job Submission

To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. calcpi.condor)


Executable  = calcpi
Requirements = ParallelSchedulingGroup == "stats group"
Universe   = vanilla
output     = calcpi$(Process).out
error      = calcpi$(Process).err
Log        = calcpi.log
Arguments  = 100000000
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Queue 50

The last line specifies that 50 instances should be scheduled on the cluster.
The description file specifies the executable and the arguments
passed to it during execution. (In this case we are requesting
that all instances iterate 10e9 times in the program’s sampling loop.)
The requirement field insists that the job stay on the Statistics Cluster.
(All statistics nodes are labeled with "stats group"
in their Condor ClassAds)
Output and error files are targets for standard out and standard error
streams respectively.
The log file is used to by Condor to record in real time the progress
in job processing. Note that this setup labels output files
by process number to prevent a job instance from overwritting
files belonging to another.
The current values imply that all files are to be found
in the same directory as the description file.

The universe variable specifies the condor runtime environment.
For the purposes of these independent jobs, the simplest "vanilla" universe suffices.
In a more complicated parallel task, with checkpointing and migration, MPI calls etc.,
more advanced run-time environments are employed, often requiring specilized
linking of the binaries.
The lines specifying transfer settings are important to avoid any assumptions about accessibility
over nfs. They should be included whether or not any output files
(aside from standard output and error) are necessary.

Job Submission and Management

The job is submitted with:


condor_submit calcpi.condor

The cluster can be queried before or after submission to check its availability. Two very versatile commands exist for this purpose: condor_status and condor_q. The former returns the status of the nodes (broken down by virtual machines that can each handle a job instance.) The latter command shows the job queue including the individual instances of every job and the submission status (e.g. idling, busy etc.)

Using condor_q a few seconds after submission shows:

-- Submitter: stat31.phys.uconn.edu :  : stat31.phys.uconn.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  33.3   prod            1/30 15:37   0+00:00:02 R  0   9.8  calcpi 100000000
  33.4   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi 100000000
  33.5   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi 100000000
  33.6   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi 100000000
  33.7   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi 100000000
  33.8   prod            1/30 15:37   0+00:00:00 R  0   9.8  calcpi 100000000

6 jobs; 0 idle, 6 running, 0 held

By this time, only 6 jobs are left on the cluster, all with status ‘R’ – running.
Various statistics are given including a job ID number.
This handle is useful if intervention is required like manual removal of
frozen job instances from the cluster.

Now, comparing the results (e.g. with command cat calcpi*.out) shows


...
pi=3.141215440000
pi=3.141447360000
pi=3.141418120000
pi=3.141797520000
...


Example with R

#!/usr/local/bin/Rscript

# Prepare: collect command line arguments,

# set iteration number and a unique seed

args <- commandArgs()

set.seed(Sys.time())

n <- as.numeric(args[length(args)-1])

# Collect n samples

x <- runif(n)

y <- runif(n)

# Compute and output the value of pi

pihat <- sum(x * x + y * y < 1) / n * 4

pihat

write(pihat, args[length(args)])

proc.time()

Let us save this script as calcpi.R.
Note the very important first line of this script.
Without it, executing the script would require a command like
Rscript calcpi.R
Specifying the location of the interpreter in the first line after ‘#!’
and adding the permission to execute this script with a command:


chmod a+x calcpi.R

greatly simplifies the handling of this program – especially useful for
submission to the cluster.

Preparation for Job Submission

To prepare the job execution space and inform Condor of the appropriate run environment, create a job description file (e.g. Rcalcpi.condor)


executable = calcpi.R

universe = vanilla

Requirements = ParallelSchedulingGroup == "stats group"

should_transfer_files = YES

when_to_transfer_output = ON_EXIT

arguments = 10000000 pihat-$(Process).dat

output    = pi-$(Process).Rout

error     = pi-$(Process).err

log       = pi.log

Queue 50

The last line specifies that 50 instances should be scheduled on the cluster.
The description file specifies the executable, an independent process universe
called “vanilla” and a requirement that the job should be confined on the
Statistics Cluster. Next, the important “transfer files” parameters
specify that any necessay input files (not relevant here) should be transfered to the
execution nodes and all files generated by the program should be transfered
back to the launch directory.
(These avoid any assumptions about directory accessibility over nfs.)

The arguments to be passed to the executable are just what the script expects:
iteration number and output file name. The output, error and log file
parameters represent the stdout, stderr and Condor job log target files respectively.
Note the unique labeling of these files according to the associated process with
the $(Process) place holder.

Job Submission and Management

The job is submitted with:


condor_submit Rcalcpi.condor

The cluster can be queried before or after submission to check its availability.
Two very versatile commands exist for this purpose:
condor_status and condor_q.
The former returns the status of the nodes
(broken down by virtual machines that can each handle a job instance.)
The latter command shows the job queue including the individual instances of every job
and the submission status (e.g. idling, busy etc.)

Using condor_q some time after submission shows:

-- Submitter: stat31.phys.uconn.edu :  : stat31.phys.uconn.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   7.0   stattestusr     3/25 15:03   0+00:00:00 R  0   9.8  calcpi.R 10000000
   7.6   stattestusr     3/25 15:03   0+00:00:04 R  0   9.8  calcpi.R 10000000
   7.10  stattestusr     3/25 15:03   0+00:00:00 R  0   9.8  calcpi.R 10000000
   7.28  stattestusr     3/25 15:03   0+00:00:00 R  0   9.8  calcpi.R 10000000
   7.45  stattestusr     3/25 15:03   0+00:00:00 R  0   9.8  calcpi.R 10000000
   7.49  stattestusr     3/25 15:03   0+00:00:00 R  0   9.8  calcpi.R 10000000

6 jobs; 0 idle, 6 running, 0 held

By this time, only 6 jobs are left on the cluster, all with status ‘R’ – running.
Various statistics are given including a job ID number.
This handle is useful if intervention is required like manual removal of
frozen job instances from the cluster.
A command condor_rm 7.28 would remove just that instance,
whereas condor_rm 7 will remove this entire job.

Now, comparing the results (e.g. with command cat pihat-*.dat) shows


...

3.141672

3.141129

3.14101

3.142149

3.141273

...