Example Applications ==================== | This section contains a list of applications that have been written or adapted to work with AMPI. Most applications are available on git: | ``git clone ssh://charm.cs.illinois.edu:9418/benchmarks/ampi-benchmarks``. Most benchmarks can be compiled with the provided top-level Makefile: .. code-block:: bash $ git clone ssh://charm.cs.illinois.edu:9418/benchmarks/ampi-benchmarks $ cd ampi-benchmarks $ make -f Makefile.ampi Mantevo project v3.0 -------------------- Set of mini-apps from the Mantevo project. Download at https://mantevo.org/download/. MiniFE ~~~~~~ - Mantevo mini-app for unstructured implicit Finite Element computations. - No changes necessary to source to run on AMPI. Modify file ``makefile.ampi`` and change variable ``AMPIDIR`` to point to your Charm++ directory, execute ``make -f makefile.ampi`` to build the program. - Refer to the ``README`` file on how to run the program. For example: ``./charmrun +p4 ./miniFE.x nx=30 ny=30 nz=30 +vp32`` MiniMD v2.0 ~~~~~~~~~~~ - Mantevo mini-app for particle interaction in a Lennard-Jones system, as in the LAMMPS MD code. - No changes necessary to source code. Modify file ``Makefile.ampi`` and change variable ``AMPIDIR`` to point to your Charm++ directory, execute ``make ampi`` to build the program. - Refer to the ``README`` file on how to run the program. For example: ``./charmrun +p4 ./miniMD_ampi +vp32`` CoMD v1.1 ~~~~~~~~~ - Mantevo mini-app for molecular dynamics codes: https://github.com/exmatex/CoMD - To AMPI-ize it, we had to remove calls to not thread-safe ``getopt()``. Support for dynamic load balancing has been added in the main loop and the command line options. It will run on all platforms. - Just update the Makefile to point to AMPI compilers and run with the provided run scripts. MiniXYCE v1.0 ~~~~~~~~~~~~~ - Mantevo mini-app for discrete analog circuit simulation, version 1.0, with serial, MPI, OpenMP, and MPI+OpenMP versions. - No changes besides Makefile necessary to run with virtualization. To build, do ``cp common/generate_info_header miniXyce_ref/.``, modify the CC path in ``miniXyce_ref/`` and run ``make``. Run scripts are in ``test/``. - Example run command: ``./charmrun +p3 ./miniXyce.x +vp3 -circuit ../tests/cir1.net -t_start 1e-6 -pf params.txt`` HPCCG v1.0 ~~~~~~~~~~ - Mantevo mini-app for sparse iterative solves using the Conjugate Gradient method for a problem similar to that of MiniFE. - No changes necessary except to set compilers in ``Makefile`` to the AMPI compilers. - Run with a command such as: ``./charmrun +p2 ./test_HPCCG 20 30 10 +vp16`` MiniAMR v1.0 ~~~~~~~~~~~~ - miniAMR applies a stencil calculation on a unit cube computational domain, which is refined over time. - No changes if using swapglobals. Explicitly extern global variables if using TLS. Not yet AMPI-zed (reason) ~~~~~~~~~~~~~~~~~~~~~~~~~ MiniAero v1.0 (build issues), MiniGhost v1.0.1 (globals), MiniSMAC2D v2.0 (globals), TeaLeaf v1.0 (globals), CloverLeaf v1.1 (globals), CloverLeaf3D v1.0 (globals). LLNL ASC Proxy Apps ------------------- LULESH v2.0 ~~~~~~~~~~~ - LLNL Unstructured Lagrangian-Eulerian Shock Hydrodynamics proxy app: https://codesign.llnl.gov/lulesh.php - Charm++, MPI, MPI+OpenMP, Liszt, Loci, Chapel versions all exist for comparison. - Manually privatized version of LULESH 2.0, plus a version with PUP routines in subdirectory ``pup_lulesh202/``. AMG 2013 ~~~~~~~~ - LLNL ASC proxy app: Algebraic Multi-Grid solver for linear systems arising from unstructured meshes: https://codesign.llnl.gov/amg2013.php - AMG is based on HYPRE, both from LLNL. The only change necessary to get AMG running on AMPI with virtualization is to remove calls to HYPRE’s timing interface, which is not thread-safe. - To build, point the CC variable in Makefile.include to your AMPI CC wrapper script and ``make``. Executable is ``test/amg2013``. Lassen v1.0 ~~~~~~~~~~~ - LLNL ASC mini-app for wave-tracking applications with dynamic load imbalance. Reference versions are serial, MPI, Charm++, and MPI/Charm++ interop: https://codesign.llnl.gov/lassen.php - No changes necessary to enable AMPI virtualization. Requires some C++11 support. Set ``AMPIDIR`` in Makefile and ``make``. Run with: ``./charmrun +p4 ./lassen_mpi +vp8 default 2 2 2 50 50 50`` Kripke v1.1 ~~~~~~~~~~~ - LLNL ASC proxy app for ARDRA, a full Sn deterministic particle transport application: https://codesign.llnl.gov/kripke.php - Charm++, MPI, MPI+OpenMP, MPI+RAJA, MPI+CUDA, MPI+OCCA versions exist for comparison. - Kripke requires no changes between MPI and AMPI since it has no global/static variables. It uses cmake so edit the cmake toolchain files in ``cmake/toolchain/`` to point to the AMPI compilers, and build in a build directory: .. code-block:: bash $ mkdir build; cd build; $ cmake .. -DCMAKE_TOOLCHAIN_FILE=../cmake/Toolchain/linux-gcc-ampi.cmake -DENABLE_OPENMP=OFF $ make Run with: .. code-block:: bash $ ./charmrun +p8 ./src/tools/kripke +vp8 --zones 64,64,64 --procs 2,2,2 --nest ZDG MCB v1.0.3 (2013) ~~~~~~~~~~~~~~~~~ - LLNL ASC proxy app for Monte Carlo particle transport codes: https://codesign.llnl.gov/mcb.php - MPI+OpenMP reference version. - Run with: .. code-block:: bash $ OMP_NUM_THREADS=1 ./charmrun +p4 ./../src/MCBenchmark.exe --weakScaling --distributedSource --nCores=1 --numParticles=20000 --multiSigma --nThreadCore=1 +vp16 .. _not-yet-ampi-zed-reason-1: Not yet AMPI-zed (reason) ~~~~~~~~~~~~~~~~~~~~~~~~~ : UMT 2013 (global variables). Other Applications ------------------ MILC 7.0 ~~~~~~~~ - MILC is a code to study quantum chromodynamics (QCD) physics. http://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/milc/ - Moved ``MPI_Init_thread`` call to ``main()``, added ``__thread`` to all global/static variable declarations. Runs on AMPI with virtualization when using -tlsglobals. - Build: edit ``ks_imp_ds/Makefile`` to use AMPI compiler wrappers, run ``make su3_rmd`` in ``ks_imp_ds/`` - Run with: ``./su3_rmd +vp8 ../benchmark_n8/single_node/n8_single.in`` SNAP v1.01 (C version) ~~~~~~~~~~~~~~~~~~~~~~ - LANL proxy app for PARTISN, an Sn deterministic particle transport application: https://github.com/losalamos/SNAP - SNAP is an update to Sweep3D. It simulates the same thing as Kripke, but with a different decomposition and slight algorithmic differences. It uses a 1- or 2-dimensional decomposition and the KBA algorithm to perform parallel sweeps over the 3-dimensional problem space. It contains all of the memory, computation, and network performance characteristics of a real particle transport code. - Original SNAP code is Fortran90-MPI-OpenMP, but this is a C-MPI-OpenMP version of it provided along with the original version. The Fortran90 version will require global variable privatization, while the C version works out of the box on all platforms. - Edit the Makefile for AMPI compiler paths and run with: ``./charmrun +p4 ./snap +vp4 --fi center_src/fin01 --fo center_src/fout01`` Sweep3D ~~~~~~~ - Sweep3D is a *particle transport* program that analyzes the flux of particles along a space. It solves a three-dimensional particle transport problem. - This mini-app has been deprecated, and replaced at LANL by SNAP (above). - Build/Run Instructions: - Modify the ``makefile`` and change variable CHARMC to point to your Charm++ compiler command, execute ``make mpi`` to build the program. - Modify file ``input`` to set the different parameters. Refer to file ``README`` on how to change those parameters. Run with: ``./charmrun ./sweep3d.mpi +p8 +vp16`` PENNANT v0.8 ~~~~~~~~~~~~ - Unstructured mesh Rad-Hydro mini-app for a full application at LANL called FLAG. https://github.com/losalamos/PENNANT - Written in C++, only global/static variables that need to be privatized are mype and numpe. Done manually. - Legion, Regent, MPI, MPI+OpenMP, MPI+CUDA versions of PENNANT exist for comparison. - For PENNANT-v0.8, point CC in Makefile to AMPICC and just ’make’. Run with the provided input files, such as: ``./charmrun +p2 ./build/pennant +vp8 test/noh/noh.pnt`` Benchmarks ---------- Jacobi-2D (Fortran) ~~~~~~~~~~~~~~~~~~~ - Jacobi-2D with 1D decomposition. Problem size and number of iterations are defined in the source code. Manually privatized. Jacobi-3D (C) ~~~~~~~~~~~~~ - Jacobi-3D with 3D decomposition. Manually privatized. Includes multiple versions: Isomalloc, PUP, FT, LB, Isend/Irecv, Iput/Iget. NAS Parallel Benchmarks (NPB 3.3) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - A collection of kernels used in different scientific applications. They are mainly implementations of various linear algebra methods. http://www.nas.nasa.gov/Resources/Software/npb.html - Build/Run Instructions: - Modify file ``config/make.def`` to make variable ``CHAMRDIR`` point to the right Charm++ directory. - Use ``make NPROCS=

CLASS=`` to build a particular benchmark. The values for ```` are (bt, cg, dt, ep, ft, is, lu, mg, sp), ``

`` is the number of ranks and ```` is the class or the problem size (to be chosen from A,B,C,D or E). Some benchmarks may have restrictions on values of ``

`` and ````. For instance, to make CG benchmark with 256 ranks and class C, we will use the following command: ``make cg NPROCS=256`` - The resulting executable file will be generated in the respective directory for the benchmark. In the previous example, a file *cg.256.C* will appear in the *CG* and ``bin/`` directories. To run the particular benchmark, you must follow the standard procedure of running AMPI programs: ``./charmrun ./cg.C.256 +p64 +vp256 ++nodelist nodelist`` NAS PB Multi-Zone Version (NPB-MZ 3.3) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - A multi-zone version of BT, SP and LU NPB benchmarks. The multi-zone intentionally divides the space unevenly among ranks and causes load imbalance. The original goal of multi-zone versions was to offer an test case for hybrid MPI+OpenMP programming, where the load imbalance can be dealt with by increasing the number of threads in those ranks with more computation. http://www.nas.nasa.gov/Resources/Software/npb.html - The BT-MZ program shows the heaviest load imbalance. - Build/Run Instructions: - Modify file ``config/make.def`` to make variable ``CHAMRDIR`` point to the right Charm++ build. - Use the format ``make NPROCS=

CLASS=`` to build a particular benchmark. The values for ```` are (bt-mz, lu-mz, sp-mz), ``

`` is the number of ranks and ```` is the class or the problem size (to be chosen from A,B,C,D or E). Some benchmarks may have restrictions on values of ``

`` and ````. For instance, to make the BT-MZ benchmark with 256 ranks and class C, you can use the following command: ``make bt-mz NPROCS=256 CLASS=C`` - The resulting executable file will be generated in the *bin/* directory. In the previous example, a file *bt-mz.256.C* will be created in the ``bin`` directory. To run the particular benchmark, you must follow the standard procedure of running AMPI programs: ``./charmrun ./bt-mz.C.256 +p64 +vp256 ++nodelist nodelist`` HPCG v3.0 ~~~~~~~~~ - High Performance Conjugate Gradient benchmark, version 3.0. Companion metric to Linpack, with many vendor-optimized implementations available: http://hpcg-benchmark.org/ - No AMPI-ization needed. To build, modify ``setup/Make.AMPI`` for compiler paths, do ``mkdir build && cd build && configure ../setup/Make.AMPI && make``. To run, do ``./charmrun +p16 ./bin/xhpcg +vp64`` Intel Parallel Research Kernels (PRK) v2.16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - A variety of kernels (Branch, DGEMM, Nstream, Random, Reduce, Sparse, Stencil, Synch_global, Synch_p2p, and Transpose) implemented for a variety of runtimes (SERIAL, OpenMP, MPI-1, MPI-RMA, MPI-SHM, MPI+OpenMP, SHMEM, FG_MPI, UPC, Grappa, Charm++, and AMPI). https://github.com/ParRes/Kernels - For AMPI tests, set ``CHARMTOP`` and run: ``make allampi``. There are run scripts included. OSU Microbenchmarks ~~~~~~~~~~~~~~~~~~~ MPI collectives performance testing suite. https://charm.cs.illinois.edu/gerrit/#/admin/projects/benchmarks/osu-collectives-benchmarking - Build with: ``./configure CC=~/charm/bin/ampicc && make`` Third Party Open Source Libraries --------------------------------- HYPRE-2.11.1 ~~~~~~~~~~~~ - High Performance Preconditioners and solvers library from LLNL. https://computation.llnl.gov/project/linear_solvers/software.php - Hypre-2.11.1 builds on top of AMPI using the configure command: .. code-block:: bash $ ./configure --with-MPI \ CC=~/charm/bin/ampicc \ CXX=~/charm/bin/ampicxx \ F77=~/charm/bin/ampif77 \ --with-MPI-include=~/charm/include \ --with-MPI-lib-dirs=~/charm/lib \ --with-MPI-libs=mpi --without-timing --without-print-errors $ make -j8 - All HYPRE tests and examples pass tests with virtualization, migration, etc. except for those that use Hypre’s timing interface, which uses a global variable internally. So just remove those calls and do not define ``HYPRE_TIMING`` when compiling a code that uses Hypre. In the examples directory, you’ll have to set the compilers to your AMPI compilers explicitly too. In the test directory, you’ll have to edit the Makefile to 1) Remove ``-DHYPRE_TIMING`` from both ``CDEFS`` and ``CXXDEFS``, 2) Remove both ``${MPILIBS}`` and ``${MPIFLAGS}`` from ``MPILIBFLAGS``, and 3) Remove ``${LIBS}`` from ``LIBFLAGS``. Then run ``make``. - To run the ``new_ij`` test, run: ``./charmrun +p64 ./new_ij -n 128 128 128 -P 4 4 4 -intertype 6 -tol 1e-8 -CF 0 -solver 61 -agg_nl 1 27pt -Pmx 6 -ns 4 -mu 1 -hmis -rlx 13 +vp64`` MFEM-3.2 ~~~~~~~~ - MFEM is a scalable library for Finite Element Methods developed at LLNL. http://mfem.org/ - MFEM-3.2 builds on top of AMPI (and METIS-4.0.3 and HYPRE-2.11.1). Download MFEM, `HYPRE `__, and `METIS `__. Untar all 3 in the same top-level directory. - Build HYPRE-2.11.1 as described above. - Build METIS-4.0.3 by doing ``cd metis-4.0.3/ && make`` - Build MFEM-3.2 serial first by doing ``make serial`` - Build MFEM-3.2 parallel by doing: - First, comment out ``#define HYPRE_TIMING`` in ``mfem/linalg/hypre.hpp``. Also, you must add a ``#define hypre_clearTiming()`` at the top of ``linalg/hypre.cpp``, because Hypre-2.11.1 has a bug where it doesn’t provide a definition of this function if you don’t define ``HYPRE_TIMING``. - ``make parallel MFEM_USE_MPI=YES MPICXX=~/charm/bin/ampicxx HYPRE_DIR=~/hypre-2.11.1/src/hypre METIS_DIR=~/metis-4.0.3`` - To run an example, do ``./charmrun +p4 ./ex15p -m ../data/amr-quad.mesh +vp16``. You may want to add the runtime options ``-no-vis`` and ``-no-visit`` to speed things up. - All example programs and miniapps pass with virtualization, and migration if added. XBraid-1.1 ~~~~~~~~~~ - XBraid is a scalable library for parallel time integration using MultiGrid, developed at LLNL. https://computation.llnl.gov/project/parallel-time-integration/software.php - XBraid-1.1 builds on top of AMPI (and its examples/drivers build on top of MFEM-3.2, HYPRE-2.11.1, and METIS-4.0.3 or METIS-5.1.0). - To build XBraid, modify the variables CC, MPICC, and MPICXX in makefile.inc to point to your AMPI compilers, then do ``make``. - To build XBraid’s examples/ and drivers/ modify the paths to MFEM and HYPRE in their Makefiles and ``make``. - To run an example, do ``./charmrun +p2 ./ex-02 -pgrid 1 1 8 -ml 15 -nt 128 -nx 33 33 -mi 100 +vp8 ++local``. - To run a driver, do ``./charmrun +p4 ./drive-03 -pgrid 2 2 2 2 -nl 32 32 32 -nt 16 -ml 15 +vp16 ++local`` Other AMPI codes ---------------- - FLASH - BRAMS (Weather prediction model) - CGPOP - Fractography3D (Crack Propagation) - JetAlloc - PlasComCM (XPACC) - PlasCom2 (XPACC) - Harm3D