====================
Multiblock Framework
====================

.. contents::
   :depth: 3

Motivation
==========

A large class of problems can be solved by first decomposing the problem
domain into a set of structured grids. For simplicity, each structured
grid is often made rectangular, when it is called a *block*. These
blocks may face one another or various parts of the outside world, and
taken together comprise a *multiblock computation*.

There are two main types of multiblock computations -- implicit and
explicit. In an implicit computation, a global matrix, which represents
the entire problem domain, is formed and solved. Implicit computations
require a fast sparse matrix solver, and are typically used for
steady-state problems. In an explicit computation, the solution proceeds
locally, computing new values based on the values of nearby points.
Explicit computations often have stability criteria, and are typically
used for time-dependent problems.

The Charm++ multiblock framework allows you to write a parallel explicit
multiblock program, in C or Fortran 90, by concentrating on what happens
to a single block of the domain. Boundary condition housekeeping and
“ghost cell” exchange are all handled transparently by the framework.
Using the multiblock framework also allows you to take advantage of all
the features of Charm++, including adaptive computation and
communication overlap, run-time load balancing, performance monitoring
and visualization, and checkpoint/restart, with no additional effort.

Introduction/Terminology
========================

A *block* is a distorted rectangular grid that represents a portion of
the problem domain. A volumetric cell in the grid is called a *voxel*.
Each exterior side of a block is called a *face*. Each face may consist
of several rectangular *patches*, which all abut the same block and
experience the same boundary conditions.

.. figure:: fig/terminology.png
   :name: fig:terminology
   :width: 3in

   Terminology used by the framework.

For example, Figure :numref:`fig:terminology` shows a 3D 4x8x7-voxel
block, with a face and 6x3 patch indicated.

The computational domain is tiled with such blocks, which are required
to be conformal -- the voxels must match exactly. The blocks need not be
the same size or orientation, however, as illustrated in the 2D domain
of Figure :numref:`fig:decompose`.

.. figure:: fig/decompose.png
   :name: fig:decompose
   :width: 4in

   A 2D domain decomposed into three blocks: A (5x3), B (3x6), and C
   (5x4). Also shows the computation as seen from block A.

Figure :numref:`fig:decompose` also shows the computation from the
point of view of block A, which has two external boundary conditions (on
the left and top sides) and two “internal” boundary conditions (on the
right and bottom sides). During the computation, the external boundary
conditions can be imposed independent of any other blocks; while the
internal boundary conditions must be obtained from the other blocks.

To simplify the computation on the interior, these boundary conditions
are typically written into special extra “ghost” (or dummy) cells around
the outside of the real interior cells. The array indexing for these
ghost cells is illustrated in Figure :numref:`fig:indexing`.

.. figure:: fig/indexing.png
   :name: fig:indexing
   :width: 2in

   The ghost cells around a 5x3-voxel 2D block

The Multiblock framework manages all the boundary conditions -- both
internal and external. Internal boundary conditions are sent across
processors, and require you to register the data “fields” you wish
exchanged. External boundary conditions are not communicated, but
require you to register a function to apply that boundary condition to
your data. Either type of boundary condition can have arbitrary
thickness.

Finally, the Multiblock framework manages nothing *but* boundary
conditions. The rest of the computation, such as deciding on and
implementing timestepping, stencils, numerics, and interpolation schemes
are all left up to the user.

Input Files
===========

The Multiblock framework reads, in parallel, a partitioned set of blocks
from block input files. Each block consists of a file with extension
“.mblk” for the interior data (grid coordinates and initial conditions)
and “.bblk” for the boundary condition data (patches where boundaries
should be applied).

These block files are generated with a separate, offline tool called
“makemblock”, which is documented elsewhere.

Structure of a Multiblock Framework Program
===========================================

A Multiblock framework program consists of several subroutines: ``init``,
``driver``, ``finalize``, and external boundary condition subroutines.

``init`` and ``finalize`` are called by the Multiblock framework only on the
first processor -- these routines typically do specialized I/O, startup
and shutdown tasks.

A separate driver subroutine runs for each block, doing the main work
of the program. Because there may be several blocks per processor,
several driver routines may execute as threads simultaneously.

The boundary condition subroutines are called by the framework after a
request from the driver.

.. code-block:: none

        subroutine init
             read configuration data
        end subroutine

        subroutine bc1
             apply first type of boundary condition
        end subroutine bc1

        subroutine bc2
             apply second type of boundary condition
        end subroutine bc2

        subroutine driver
             allocate and initialize the grid
             register boundary condition subroutines bc1 and bc2
             time loop
                  apply external boundary conditions
                  apply internal boundary conditions
                  perform serial internal computation
             end time loop
        end subroutine

        subroutine finalize
              write results
        end subroutine

Compilation and Execution
=========================

A Multiblock framework program is a Charm++ program, so you must begin
by downloading the latest source version of Charm++ from
https://charm.cs.illinois.edu. Build the source with
``./build MBLOCK version`` or ``cd`` into the build directory,
``<version>/tmp``, and type ``make MBLOCK``. To compile a MULTIBLOCK
program, pass the ``-language mblock`` (for C) or ``-language mblockf``
(for Fortran) option to ``charmc``.

In a charm installation, see ``charm/<version>/pgms/charm++/mblock/`` for
example and test programs.

Preparing Input Files
=====================

The Multiblock framework reads its description of the problem domain
from input "block" files, which are in a Multiblock-specific format. The
files are named with the pattern prefixnumber.ext, where prefix is a
arbitrary string prefix you choose, number is the number of this block
(virtual processor), and ext is either “mblk”, which contains binary
data with the block coordinates, or “bblk”, which contains ASCII data
with the block’s boundary conditions.

You generate these Multiblock input files using a tool called
*makemblock*, which can be found in ``charm/<version>/pgms/charm++/makemblock``.
makemblock can read a description of the problem domain generated by the
structured meshing program Gridgen (from Pointwise) in .grd and .inp
format; or read a binary .msh format. makemblock divides this input
domain into the number of blocks you specify, then writes out .mblk and
.bblk files.

For example, to divide the single binary mesh “in1.msh” into 20 pieces
“out00001.[mb]blk”..“out00020.[mb]blk”, you’d use

.. code-block:: bash

   $ makemblock in1.msh 20 out

You would then run this mesh using 20 virtual processors.

Multiblock Framework API Reference
==================================

The Multiblock framework is accessed from a program via a set of
routines. These routines are available in both C and Fortran90 versions.
The C versions are all functions, and always return an error code of
MBLK_SUCCESS or MBLK_FAILURE. The Fortran90 versions are all
subroutines, and take an extra integer parameter “err” which will be set
to MBLK_SUCCESS or MBLK_FAILURE.

Initialization
--------------

All these methods should be called from the init function by the user.
The values passed to these functions are typically read from a
configuration file or computed from command-line parameters.


.. code-block:: c++

  int MBLK_Set_prefix(const char *prefix);

.. code-block:: fortran

  subroutine MBLK_Set_prefix(prefix,err)
  character*, intent(in)::prefix
  integer, intent(out)::err


This function is called to set the block filename prefix. For example,
if the input block files are named “gridX00001.mblk” and
“gridX00002.mblk”, the prefix is the string “gridX”.

.. code-block:: c++

  int MBLK_Set_nblocks(const int n);

.. code-block:: fortran

  subroutine MBLK_Set_nblocks(n,err)
  integer, intent(in)::n
  integer, intent(out)::err

This call is made to set the number of partitioned blocks to be used.
Each block is read from an input file and a separate driver is spawned
for each. The number of blocks determines the available parallelism,
so be sure to have at least as many blocks as processors. We recommend
using several times more blocks than processors, to ease load
balancing and allow adaptive overlap of computation and communication.

Be sure to set the number of blocks equal to the number of virtual
processors (+vp command-line option).

.. code-block:: c++

  int MBLK_Set_dim(const int n);

.. code-block:: fortran

  subroutine MBLK_Set_dim(n, err)
  integer, intent(in)::n
  integer, intent(out)::err

This call is made to set the number of spatial dimensions. Only three
dimensional computations are currently supported.

Utility
-------

.. code-block:: c++

  int MBLK_Get_nblocks(int* n);

.. code-block:: fortran

  subroutine MBLK_Get_nblocks(n,err)
  integer,intent(out)::n
  integer,intent(out)::err

Get the total number of blocks in the current computation. Can only be
called from the driver routine.

.. code-block:: c++

  int MBLK_Get_myblock(int* m);

.. code-block:: fortran

  subroutine MBLK_Get_myblock(m,err)
  integer,intent(out)::m
  integer,intent(out)::err

Get the id of the current block, an integer from 0 to the number of
blocks minus one. Can only be called from the driver routine.

.. code-block:: c++

  int MBLK_Get_blocksize(int* dims);

.. code-block:: fortran

  subroutine MBLK_Get_blocksize(dimsm,err)
  integer,intent(out)::dims(3)
  integer,intent(out)::err

Get the interior dimensions of the current block, in voxels. The size
of the array dims should be 3, and will be filled with the :math:`i`,
:math:`j`, and :math:`k` dimensions of the block. Can only be called
from the driver routine.

.. code-block:: c++

  int MBLK_Get_nodelocs(const int* nodedim,double *nodelocs);

.. code-block:: fortran

  subroutine MBLK_Get_blocksize(nodedim,nodelocs,err)
  integer,intent(in)::nodedims(3)
  double precision,intent(out)::nodedims(3,nodedims(0),nodedims(1),nodedims(2))
  integer,intent(out)::err

Get the :math:`(x,y,z)` locations of the nodes of the current block.
The 3-array nodedim should be the number of nodes you expect, which
must be exactly one more than the number of interior voxels.

.. figure:: fig/nodeloc.pdf
   :width: 3in

   The C node and voxel :math:`(i,j,k)` numbering for a 2 x 2 voxel
   block. For the fortran numbering, add 1 to all indices. Ghost voxels
   are omitted.

You cannot obtain the locations of ghost nodes via this routine. To get
the locations of ghost nodes, create a node-centered field containing
the node locations and do an update field. Can only be called from the
driver routine.

.. code-block:: c++

  double MBLK_Timer(void);

.. code-block:: fortran

  function double precision :: MBLK_Timer()

Return the current wall clock time, in seconds. Resolution is
machine-dependent, but is at worst 10ms.

.. code-block:: c++

  void MBLK_Print_block(void);

.. code-block:: fortran

  subroutine MBLK_Print_block()

Print a debugging representation of the framework’s information about
the current block.

.. code-block:: c++

  void MBLK_Print(const char *str);

.. code-block:: fortran

  subroutine MBLK_Print(str)
  character*, intent(in) :: str

Print the given string, prepended by the block id if called from the
driver. Works on all machines, unlike ``printf`` or ``print *``, which may
not work on all parallel machines.

Internal Boundary Conditions and Block Fields
---------------------------------------------

The Multiblock framework handles the exchange of boundary values between
neighboring blocks. The basic mechanism to do this exchange is the
*field* -- numeric data items associated with each cell of a block. These
items must be arranged in a regular 3D grid, but otherwise we make no
assumptions about the meaning of a field.

You create a field once, with MBLK_Create_Field, then pass the resulting
field ID to MBLK_Update_Field (which does the overlapping block
communication) and/or MBLK_Reduce_Field (which applies a reduction over
block values).

.. code-block:: c++

  int MBLK_Create_Field(int *dimensions,int isVoxel,const int
  base_type,const int vec_len,const int offset,const int dist, int
  *fid);

.. code-block:: fortran

  subroutine MBLK_Create_Field(dimensions, isVoxel,base_type, vec_len, offset, dist, err)
  integer, intent(in) :: dimensions, isVoxel, base_type, vec_len, offset, dist
  integer, intent(out) :: fid, err

Creates and returns a Multiblock field ID, which can be passed to
MBLK_Update_Field and MBLK_Reduce_Field. Can only be called from
driver().

``dimensions`` describes the size of the array the field is in as an
array of size 3, giving the :math:`i`, :math:`j`, and
:math:`k` sizes. The size should include the ghost regions -- i.e., pass
the actual allocated size of the array. ``isVoxel`` describes whether the
data item is to be associated with a voxel (1, a volume-centered value)
or the nodes (0, a node-centered value). ``base_type`` describes the type of
each data item, one of:

-  MBLK_BYTE -- ``unsigned char``, ``INTEGER*1``, or ``CHARACTER*1``

-  MBLK_INT -- ``int`` or ``INTEGER*4``

-  MBLK_REAL -- ``float`` or ``REAL*4``

-  MBLK_DOUBLE -- ``double``, ``DOUBLE PRECISION``, or ``REAL*8``

``vec_len`` describes the number of data items associated with each cell, an
integer at least 1.

``offset`` is the byte offset from the start of the array to the first
interior cell’s data items, a non-negative integer. This can be
calculated using the ``offsetof()`` function, normally with
``offsetof(array(1,1,1), array(interiorX,interiorY,interiorZ))``. Be sure to
skip over any ghost regions.

``dist`` is the byte offset from the first cell’s data items to the second,
a positive integer (normally the size of the data items). This can also
be calculated using ``offsetof()``; normally with
``offsetof(array(1,1,1), array(2,1,1))``.

``fid`` is the identifier for the field that is created by the function.

In the example below, we register a single double-precision value with
each voxel. The ghost region is 2 cells deep along all sides.

.. code-block:: fortran

       !In Fortran
       double precision, allocatable :: voxData(:,:,:)
       integer :: size(3), ni,nj,nk
       integer :: fid, err

       !Find the dimensions of the grid interior
       MBLK_Get_blocksize(size,err);

       !Add ghost region width to the interior dimensions
       size=size+4;  ! 4 because of the 2-deep region on both sides

       !Allocate and initialize the grid
       allocate(voxData(size(1),size(2),size(3)))
       voxData=0.0

       !Create a field for voxData
       call MBLK_Create_field(&
              &size,1, MBLK_DOUBLE,3,&
              &offsetof(grid(1,1,1),grid(3,3,3)),&
              &offsetof(grid(1,1,1),grid(2,1,1)),fid,err)


This example uses the Fortran-only helper routine ``offsetof``, which
returns the offset in bytes of memory between its two given variables. C
users can use the built-in ``sizeof`` keyword or pointer arithmetic to
achieve the same result.

.. code-block:: c++

  void MBLK_Update_field(const int fid,int ghostwidth, void *grid);

.. code-block:: fortran

  subroutine MBLK_Update_field(fid,ghostwidth, grid,err)
  integer, intent(in) :: fid, ghostwidth
  integer,intent(out) :: err
  varies, intent(inout) :: grid


Update the values in the ghost regions specified when the field was
created. This call sends this block’s interior region out, and
receives this block’s boundary region from adjoining blocks.

``ghostwidth`` controls the thickness of the ghost region. To exchange only
one cell on the boundary, pass 1. To exchange two cells, pass 2. To
include diagonal regions, make the ghost width negative. A ghost width
of zero would communicate no data.

.. figure:: fig/ghostwidth.png
   :name: fig:ghostwidth
   :width: 2in

   The 2D ghost cells communicated for various ghost widths. The heavy
   line is the block interior boundary -- this is the lower left portion
   of the block.

MBLK_Update_field can only be called from the driver, and to be useful, must
be called from every block’s driver routine.

MBLK_Update_field blocks until the field has been updated. After this
routine returns, the given field will updated. If the update was
successful MBLK_SUCCESS is returned, otherwise MBLK_FAILURE is returned in case
of error.

.. code-block:: c++

  void MBLK_Iupdate_field(const int fid,int ghostwidth, void *ingrid, void* outgrid);

.. code-block:: fortran

  subroutine MBLK_Iupdate_field(fid,ghostwidth, ingrid, outgrid,err)
  integer, intent(in) :: fid, ghostwidth
  integer,intent(out) :: err
  varies,intent(in) :: ingrid
  varies,intent(out) :: outgrid

Update the values in the ghost regions which were specified when the
field was created. For the example above the ghost regions will be
updated once for each step in the time loop.

MBLK_Iupdate_field can only be called from the driver, and to be useful,
must be called from every block’s driver routine.

MBLK_Iupdate_field is a non blocking call similar to MPI_Irecv. After
the routine returns the update may not yet be complete and the outgrid
may be in an inconsistent state. Before using the values, the status of
the update must be checked using MBLK_Test_update or MBLK_Wait_update.

There can be only one outstanding Iupdate call in progress at any time.

.. code-block:: c++

  int MBLK_Test_update(int *status);

.. code-block:: fortran

  subroutine MBLK_Test_update(status,err)
  integer, intent(out) :: status,err

MBLK_Test_update is a call that is used in association with
MBLK_Iupdate_field from the driver subroutine. It tests whether the
preceding Iupdate has completed or not. ``status`` is returned as
MBLK_DONE if the update was completed or MBLK_NOTDONE if the update is
still pending. Rather than looping if the update is still pending,
call MBLK_Wait_update to relinquish the CPU.

.. code-block:: c++

  void MBLK_Wait_update(void);

.. code-block:: fortran

  subroutine MBLK_Wait_update()

MBLK_Wait_update call is a blocking call and is used in association with
MBLK_Iupdate_field call. It blocks until the update is completed.

.. code-block:: c++

  void MBLK_Reduce_field(int fid,void *grid, void *out,int op);

.. code-block:: fortran

  subroutine MBLK_Reduce_field(fid,grid,outVal,op)
  integer, intent(in) :: fid,op
  varies, intent(in) :: grid
  varies, intent(out) :: outVal

Combine a field from each block, according to ``op``, across all blocks.
Only the interior values of the field will be combined, not the ghost
cells. After Reduce_Field returns, all blocks will have identical
values in ``outVal``, which must be ``vec_len`` copies of ``base_type``.

May only be called from the driver, and to complete, must be called from
every chunk’s driver routine.

op must be one of:

-  MBLK_SUM -- each element of ``outVal`` will be the sum of the corresponding
   fields of all blocks

-  MBLK_MIN -- each element of ``outVal`` will be the smallest value among the
   corresponding field of all blocks

-  MBLK_MAX -- each element of ``outVal`` will be the largest value among the
   corresponding field of all blocks

.. code-block:: c++

  void MBLK_Reduce(int fid,void *inVal,void *outVal,int op);

.. code-block:: fortran

  subroutine MBLK_Reduce(fid,inVal,outVal,op)
  integer, intent(in) :: fid,op
  varies, intent(in) :: inVal
  varies, intent(out) :: outVal

Combine a field from each block, acoording to ``op``, across all blocks.
``fid`` is only used for ``base_type`` and ``vec_len`` -- ``offset`` and
``dist`` are not used. After this call returns, all blocks will have
identical values in ``outVal``. ``op`` has the same values and meaning as
MBLK_Reduce_Field. May only be called from the driver, and to complete,
must be called from every block's driver routine.

External Boundary Conditions
----------------------------

Most problems include some sort of boundary conditions. These conditions
are normally applied in the ghost cells surrounding the actual
computational domain. Examples of boundary conditions are imposed
values, reflection walls, symmetry planes, inlets, and exits.

The Multiblock framework keeps track of where boundary conditions are to
be applied. You register a subroutine that the framework will call to
apply each type of external boundary condition.

.. code-block:: c++

  int MBLK_Register_bc(const int bcnum, int ghostWidth, const MBLK_BcFn bcfn);

.. code-block:: fortran

  subroutine MBLK_Register_bc(bcnum, ghostwidth, bcfn, err)
  integer,intent(in) :: bcnum, ghostWidth
  integer,intent(out) :: err
  subroutine :: bcfn

This call is used to bind an external boundary condition
subroutine, written by you, to a boundary condition number.
MBLK_Register_bc should only be called from the driver.

-  ``bcnum`` -- The boundary condition number to be associated with the
   function.

-  ``ghostWidth`` -- The width of the ghost cells where this boundary condition
   is to be applied.

-  ``bcfn`` -- The user subroutine to be called to apply this boundry
   condition.

When you ask the framework to apply boundary conditions, it will call
this routine. The routine should be declared like:

.. code-block:: fortran

       !In Fortran
       subroutine applyMyBC(param1,param2,start,end)
       varies :: param1, param2
       integer :: start(3), end(3)
       end subroutine

.. code-block:: c++

       /* In C */
       void applyMyBC(void *param1,void *param2,int *start,int *end);

``param1`` and ``param2`` are not used by the framework -- they are passed in
unmodified from MBLK_Apply_bc and MBLK_Apply_bc_all. ``param1`` and ``param2``
typically contain the block data and dimensions.

``start`` and ``end`` are 3-element arrays that give the :math:`i`,\ :math:`j`,
:math:`k` block locations where the boundary condition is to be applied.
They are both inclusive and both relative to the block interior -- you
must shift them over your ghost cells. The C versions are 0-based (the
first index is zero), while the Fortran versions are 1-based (the first index
is one).

For example, a Fortran subroutine to apply the constant value 1.0 across
the boundary, with a 2-deep ghost region, would be:

.. code-block:: fortran

       !In Fortran
       subroutine applyMyBC(grid,size,start,end)
         integer :: size(3), i,j,k
         double precision :: grid(size(1),size(2),size(3))
         integer :: start(3), end(3)
         start=start+2 ! Back up over ghost region
         end=end+2
         do i=start(1),end(1)
          do j=start(2),end(2)
            do k=start(3),end(3)
              grid(i,j,k)=1.0
            end do
          end do
         end do

       end subroutine

.. code-block:: c++

  int MBLK_Apply_bc(const int bcnum, void *param1,void *param2);

.. code-block:: fortran

  subroutine MBLK_Apply_bc(bcnum, param1,param2,err)
  integer,intent(in)::bcnum
  varies,intent(inout)::param1
  varies,intent(inout)::param2
  integer,intent(out)::err

MBLK_Apply_bc call is made to apply all boundary condition functions
of type ``bcnum`` to the block. ``param1`` and ``param2`` are passed unmodified to
the boundary condition function.

.. code-block:: c++

  int MBLK_Apply_bc_all(void* param1, void* param2);

.. code-block:: fortran

  subroutine MBLK_Apply_bc_all(param1,param2, err)
  integer,intent(out)::err
  varies,intent(inout)::param1
  varies,intent(inout)::param2

This call is same as MBLK_Apply_bc except it applies all external
boundary conditions to the block.

Migration
---------

The Charm++ runtime system includes automated, runtime load
balancing, which will automatically monitor the performance of your
parallel program. If needed, the load balancer can “migrate” mesh chunks
from heavily-loaded processors to more lightly-loaded processors,
improving the load balance and speeding up the program. For this to be
useful, pass the +vpN argument with a larger number of blocks N than
processors Because this is somewhat involved, you may refrain from
calling MBLK_Migrate and migration will never take place.

The runtime system can automatically move your thread stack to the new
processor, but you must write a PUP function to move any global or
heap-allocated data to the new processor. (Global data is declared at
file scope or ``static`` in C and ``COMMON`` in Fortran77. Heap allocated data
comes from C ``malloc``, C++ ``new``, or Fortran90 ``ALLOCATE``.) A PUP
(Pack/UnPack) function performs both packing (converting heap data into
a message) and unpacking (converting a message back into heap data). All
your global and heap data must be collected into a single block (``struct``
in C, user-defined ``TYPE`` in Fortran) so the PUP function can access it
all.

Your PUP function will be passed a pointer to your heap data block and a
special handle called a “pupper”, which contains the network message to
be sent. Your PUP function returns a pointer to your heap data block. In
a PUP function, you pass all your heap data to routines named ``pup_type``,
where type is either a basic type (such as ``int``, ``char``, ``float``, or ``double``)
or an array type (as before, but with a “s” suffix). Depending on the
direction of packing, the pupper will either read from or write to the
values you pass -- normally, you shouldn’t even know which. The only time
you need to know the direction is when you are leaving a processor or
just arriving. Correspondingly, the pupper passed to you may be deleting
(indicating that you are leaving the processor, and should delete your
heap storage after packing), unpacking (indicating you’ve just arrived
on a processor, and should allocate your heap storage before unpacking),
or neither (indicating the system is merely sizing a buffer, or
checkpointing your values).

PUP functions are much easier to write than explain -- a simple C heap
block and the corresponding PUP function is:

.. code-block:: c++

  typedef struct {
    int n1; /*Length of first array below*/
    int n2; /*Length of second array below*/
    double *arr1; /*Some doubles, allocated on the heap*/
    int *arr2; /*Some ints, allocated on the heap*/
  } my_block;

  my_block *pup_my_block(pup_er p,my_block *m)
  {
    if (pup_isUnpacking(p)) m=malloc(sizeof(my_block));
    pup_int(p, &m->n1);
    pup_int(p, &m->n2);
    if (pup_isUnpacking(p)) {
      m->arr1=malloc(m->n1*sizeof(double));
      m->arr2=malloc(m->n2*sizeof(int));
    }
    pup_doubles(p,m->arr1,m->n1);
    pup_ints(p,m->arr2,m->n2);
    if (pup_isDeleting(p)) {
      free(m->arr1);
      free(m->arr2);
      free(m);
    }
    return m;
  }

This single PUP function can be used to copy the my_block data into a
message buffer and free the old heap storage (deleting pupper), allocate
storage on the new processor and copy the message data back (unpacking
pupper), or save the heap data for debugging or checkpointing.

A Fortran ``TYPE`` block and corresponding PUP routine is as follows:

.. code-block:: fortran

        MODULE my_block_mod
          TYPE my_block
            INTEGER :: n1,n2x,n2y
            REAL*8, POINTER, DIMENSION(:) :: arr1
            INTEGER, POINTER, DIMENSION(:,:) :: arr2
          END TYPE
        END MODULE

        SUBROUTINE pup_my_block(p,m)
          IMPLICIT NONE
          USE my_block_mod
          USE pupmod
          INTEGER :: p
          TYPE(my_block) :: m
          call pup_int(p,m%n1)
          call pup_int(p,m%n2x)
          call pup_int(p,m%n2y)
          IF (pup_isUnpacking(p)) THEN
            ALLOCATE(m%arr1(m%n1))
            ALLOCATE(m%arr2(m%n2x,m%n2y))
          END IF
          call pup_doubles(p,m%arr1,m%n1)
          call pup_ints(p,m%arr2,m%n2x*m%n2y)
          IF (pup_isDeleting(p)) THEN
            DEALLOCATE(m%arr1)
            DEALLOCATE(m%arr2)
          END IF
        END SUBROUTINE

.. code-block:: c++

  int MBLK_Register(void *block, MBLK_PupFn pup_ud, int* rid)

.. code-block:: fortran

  subroutine MBLK_Register(block,pup_ud, rid)
  integer, intent(out)::rid
  TYPE(varies), POINTER :: block
  SUBROUTINE :: pup_ud

Associates the given data block and PUP function. Returns a block ID,
which can be passed to MBLK_Get_registered later. Can only be called
from driver. It returns MBLK_SUCESS if the call was successful and
MBLK_FAILURE in case of error. For the declarations above, you call
MBLK_Register as:

.. code-block:: c++

             /*C/C++ driver() function*/
             int myId, err;
             my_block *m=malloc(sizeof(my_block));
             err =MBLK_Register(m,(MBLK_PupFn)pup_my_block,&rid);

.. code-block:: fortran

             !- Fortran driver subroutine
             use my_block_mod
             interface
               subroutine pup_my_block(p,m)
                 use my_block_mod
                 INTEGER :: p
                 TYPE(my_block) :: m
               end subroutine
             end interface
             TYPE(my_block) :: m
             INTEGER :: myId,err
             MBLK_Register(m,pup_my_block,myId,err)

Note that Fortran blocks must be allocated on the stack in driver, while
C/C++ blocks may be allocated on the heap.

.. code-block:: c++

  void MBLK_Migrate()

.. code-block:: fortran

  subroutine MBLK_Migrate()

Informs the load balancing
system that you are ready to be migrated, if needed. If the system
decides to migrate you, the PUP function passed to MBLK_Register will be
called with a sizing pupper, then a packing and deleting pupper. Your stack
(and pupped data) will then be sent to the destination machine, where
your PUP function will be called with an unpacking pupper. MBLK_Migrate
will then return, whereupon you should call MBLK_Get_registered to get
your unpacked data block. Can only be called from the driver.

.. code-block:: c++

  int MBLK_Get_Userdata(int n, void** block)

Return your unpacked
userdata after migration -- that is, the return value of the unpacking
call -- to your PUP function. Takes the userdata ID returned by
MBLK_Register. Can be called from the driver at any time.

Since Fortran blocks are always allocated on the stack, the system
migrates them to the same location on the new processor, so no
Get_Registered call is needed from Fortran.