In Progress Metrics

These are metrics that are consistered under development (and likely need more eyes) to get fully working.

Network

network-chatterbug

Chatterbug provides a suite of communication proxy applications for HPC. We use a launcher/worker design.

Name Description Type Default
mpirun The options to give to mpirun (includes tasks) string -N 8
command The chatterbug command (subdirectory) to run, see options below string stencil3d
args Arguments for the command string 1 2 2 10 10 10 4 1
sole-tenancy Require sole tenancy string ("true" or "false") "true"

By default, we require sole-tenancy, but you can disable this. Note that the best place to look for “documentation” on the commands seems to be the source code. The following command options are available for command:

  • pairs

  • ping-ping

  • spread

  • stencil3d

  • stencil4d

  • subcom2d-coll

  • subcom2d-a2a

  • unstr-mesh

We have tested mostly stencil3d. Note that the mpirun command is parsed as follows:

$ mpirun --hostfile ./hostfile.txt --allow-run-as-root -N 4 /root/chatterbug/${command}/${executable} ${args}

Thus for the defaults, you’d get this command (on one pod):

$ mpirun --hostfile ./hostfile.txt --allow-run-as-root -N 4 /root/chatterbug/stencil3d/stencil3d.x 1 2 2 10 10 10 4 1

See the example linked in the header for a metrics.yaml example.

Standalone

app-hpl

The Linpack benchmark is used for the Top500, and generally is solving a dense system of linear equations. Arguments to customize include the following:

Name Description Type Default
mpiargs Arguments to give to mpi string empty string
tasks Number of tasks per node int32 detected used nproc
ratio target memory occupation string (but as a float, e.g., "0.3") "0.3"
memory memory in GiB int32 detected from proc
blocksize blocksize is the NBs "number blocks" value int32
pfact int32
nbmin int32
ndiv int32
row_or_colmajor_pmapping PMAP process mapping (0=Row-,1=Column-major) int32 0
rfact (0=left, 1=Crout, 2=Right) int32 0
bcast (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) int32 0
depth number of lookahead depth int32 0
swap (0=bin-exch,1=long,2=mix) int32 0
swappingThreshold int32 64
l1transposed (0=transposed,1=no-transposed) int32 0
utransposed (0=transposed,1=no-transposed) int32 0
memAlignment memory alignment in double (> 0) (4,8,16) int32

For the meaning of each of these, see this documentation and how they are used in hpl.go I made an effort to define them above, but you should consult the documentation above, because I don’t fully understand these yet.

We provide a simple build here, as typically vendors spend a lot of time custom-compiling the code for their architectures (and we are compiling for general use). We will use a script compute_N from the OLHPC Tutorials to generate input data for a particular problem size, and you can vary the input to this script via the computeArgs parameters. We use a default, and you can inspect the script help below:

compute_N --help
# compute_N -h
Compute N for HPL runs.

SYNOPSIS
  compute_N [-v] [--mem <SIZE_IN_GB>] [-N <NODES>] [-r <RATIO>] [-NB <NB>]
  compute_N [-v] [--mem <SIZE_IN_GB>] [-N <NODES>] [-p <PERCENTAGE_MEM>] [-NB <NB>]

  The following formulae is used (when using '-r <ratio>'):
     N = <ratio>*SQRT( Total Memory Size in bytes / sizeof(double) )
       = <ratio>*SQRT( <nnodes> * <ram_size> / 8)

  Alternatively you may wish to specify a memory usage ratio (with -p <percentage_mem>),
  in which case the following formulae is used:
      N = SQRT( <percentage_mem>/100 * Total Memory Size in bytes / sizeof(doubl)

OPTIONS
  -m --mem --ramsize <SIZE>
     Specify the total memory size per node, in GiB.
     Default RAM size consider (yet in KiB): 16051112 KiB
  -N --nodes <N>
     Number of compute nodes
  -NB <NB>
     NB parameters to use. Default: 192 (384 for skylake)
  -p --memshare <PERCENTAGE_MEM>
     Percentage of the total memory size to use.
     Derived from the below global ratio (i.e. 0% since RATIO=0.8)
  -r --ratio <RATIO>
     Global ratio to apply. Default: 0.8

EXAMPLE
  For 2 broadwell nodes on iris cluster, using 30% of the total memory per node:
     compute_N -N 2 -p 30 -m 128 -NB 192
  For 4 skylake nodes on iris cluster, using 85% of the total memory per node:
     compute_N -N 4 -p 85 -m 128 -NB 384

AUTHORS
  Sebastien Varrette <Sebastien.Varrette@uni.lu> and UL HPC Team

COPYRIGHT
  This is free software; see the source for copying conditions.  There is
  NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The following examples are provided to generate the HPL.dat for the analysis:

/opt/tutorials/benchmarks/HPL/scripts/compute_N -h
# 1 Broadwell node, alpha = 0.3
/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 192 -r 0.3 -N 1
# 2 Skylake (regular) nodes, alpha = 0.3
/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 384 -r 0.3 -N 2
# 4 bigmem (skylake) nodes, beta = 0.85
/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 3072 -NB 384 -p 85 -N 4

Here is a tiny setup I created for a testing case:

/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 192 -r 0.3 -N 2

Next, you might care about the input data, a file called hpl.dat. By default we use a template that is populated by the above variables, and here is another example that I found in the repository:

Default hpl.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
24650         Ns
1            # of NBs
192           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
2            # of process grids (P x Q)
2 4             Ps
14 7            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

If there is something above not properly exposed please let us know.


Last update: Nov 27, 2023