1. Charisma

1.1. Introduction

This manual describes Charisma, an orchestration language for migratable parallel objects. Charisma can be downloaded from https://charm.cs.illinois.edu/gerrit/gitweb?p=Charisma.git

1.2. Charisma Syntax

A Charisma program is composed of two parts: the orchestration code in a .or file, and sequential user code in C/C++ form.

1.2.1. Orchestration Code

The orchestration code in the .or file can be divided into two parts. The header part contains information about the program, included external files, defines, and declaration of parallel constructs used in the code. The orchestration section is made up of statements that form a global control flow of the parallel program. In the orchestration code, Charisma employs a macro dataflow approach; the statements produce and consume values, from which the control flows can be organized, and messages and method invocations generated.

1.2.1.1. Header Section

The very first line should give the name of the Charisma program with the program keyword.

program jacobi

The program keyword can be replaced with module, which means that the output program is going to be a library module instead of a stand-alone program. Please refer to Section 1.4 for more details.

Next, the programmer can include external code files in the generated code with the keyword include specifying the filename without extension. For example, the following statement tells the Charisma compiler to look for header file “particles.h” to be included in the generated header file “jacobi.h” and to look for C/C++ code file “particles.[C | cc | cpp | cxx | c]” to be included in the generated C++ code file “jacobi.C”.

include particles;

It is useful when there is source code that must precede the generated parallel code, such as basic data structure declaration.

After the include section is the define section, where environmental variables can be defined for Charisma. For example, to tell Charisma to generate additional code to enable the load balancing module, the programmer needs to use define ldb; in the orchestration code. Please refer to Section 1.7 for details.

1.2.1.2. Declaration Section

Next comes the declaration section, where classes, objects and parameters are declared. A Charisma program is composed of multiple sets of parallel objects which are organized by the orchestration code. Different sets of objects can be instantiated from different class types. Therefore, we have to specify the class types and object instantiation. Also we need to specify the parameters (See Section 1.2.1.3) to use in the orchestration statements.

A Charisma program or module has one “MainChare” class, and it does not require explicit instantiation since it is a singleton. The statement to declare MainChare looks like this:

class JacobiMain : MainChare;

For object arrays, we first need to declare the class types inherited from 1D object array, 2D object array, etc, and then instantiate from the class types. The dimensionality information of the object array is given in a pair of brackets with each dimension size separated by a comma.

class JacobiWorker : ChareArray1D;
obj workers : JacobiWorker[N];

class Cell : ChareArray3D;
obj cells : Cell[M,M,M];

Note that key word class is for class type derivation, and obj is for parallel object or object array instantiation. The above code segment declares a new class type JacobiWorker which is a 1D object array, and the programmer is supposed to supply sequential code for it in files JacobiWorker.h and JacobiWorker.C (see Section 1.2.2 for more details on sequential code). Object array workers is instantiated from JacobiWorker and has 16 elements.

The last part is orchestration parameter declaration. These parameters are used only in the orchestration code to connect input and output of orchestration statements, and their data type and size is declared here. More explanation of these parameters can be found in Section 1.2.1.3.

param lb : double[N];
param rb : double[N];

With this, lb and rb are declared as parameters that can be “connected” with local variables of double array of size of 512.

1.2.1.3. Orchestration Section

In the main body of orchestration code, the programmer describes the behavior and interaction of the elements of the object arrays using orchestration statements.

\(\bullet\) Foreach Statement

The most common kind of parallelism is the invocation of a method across all elements in an object array. Charisma provides a foreach statement for specifying such parallelism. The keywords foreach and end-foreach forms an enclosure within which the parallel invocation is performed. The following code segment invokes the entry method compute on all the elements of array myWorkers.

foreach i in workers
  workers[i].compute();
end-foreach

\(\bullet\) Publish Statement and Produced/Consumed Parameters

In the orchestration code, an object method invocation can have input and output (consumed and produced) parameters. Here is an orchestration statement that exemplifies the input and output of this object methods workers.produceBorders and workers.compute.

foreach i in workers
  (lb[i], rb[i]) <- workers[i].produceBorders();
  workers[i].compute(lb[i+1], rb[i-1]);

  (+error) <- workers[i].reduceData();
end-foreach

Here, the entry method workers[i].produceBorders produces (called published in Charisma) values of lb[i], rb[i], enclosed in a pair of parentheses before the publishing sign <-. In the second statement, function workers[i].compute consumes values of lb[i+1], rb[i-1], just like normal function parameters. If a reduction operation is needed, the reduced parameter is marked with a + before it, like the error in the third statement.

An entry method can have arbitrary number of published (produced and reduced) values and consumed values. In addition to basic data types, each of these values can also be an object of arbitrary type. The values published by A[i] must have the index i, whereas values consumed can have the index e(i), which is an index expression in the form of i\(\pm c\) where \(c\) is a constant. Although we have used different symbols (p and q) for the input and the output variables, they are allowed to overlap.

The parameters are produced and consumed in the program order. Namely, a parameter produced in an early statement will be consumed by the next consuming statement, but will no longer be visible to any consuming statement after a subsequent statement producing the same parameter in program order. Special rules involving loops are discussed later with loop statement.

\(\bullet\) Overlap Statement

Complicated parallel programs usually have concurrent flows of control. To explicitly express this, Charisma provides a overlap keyword, whereby the programmer can fire multiple overlapping control flows. These flows may contain different number of steps or statements, and their execution should be independent of one another so that their progress can interleave with arbitrary order and always return correct results.

overlap
{
  foreach i in workers1
    (lb[i], rb[i]) <- workers1[i].produceBorders();
  end-foreach
  foreach i in workers1
    workers1[i].compute(lb[i+1], rb[i-1]);
  end-foreach
}
{
  foreach i in workers2
    (lb[i], rb[i]) <- workers2[i].compute(lb[i+1], rb[i-1]);
  end-foreach
}
end-overlap

This example shows an overlap statement where two blocks in curly brackets are executed in parallel. Their execution joins back to one at the end mark of end-overlap.

\(\bullet\) Loop Statement

Loops are supported with for statement and while statement. Here are two examples.

for iter = 0 to MAX_ITER
   workers.doWork();
end-for
while (err > epsilon)
   (+err) <- workers.doWork();
   MainChare.updateError(err);
end-while

The loop condition in for statement is independent from the main program; it simply tells the program to repeat the block for so many times. The loop condition in while statement is actually updated in the MainChare. In the above example, err and epsilon are both member variables of class MainChare, and can be updated as the example shows. The programmer can activate the “autoScalar” feature by including a define autoScalar; statement in the orchestration code. When autoScalar is enabled, Charisma will find all the scalars in the .or file, and create a local copy in the MainChare. Then every time the scalar is published by a statement, an update statement will automatically be inserted after that statement. The only thing that the programmer needs to do is to initialize the local scalar with a proper value.

Rules of connecting produced and consumed parameters concerning loops are natural. The first consuming statement will look for values produced by the last producing statement before the loop, for the first iteration. The last producing statement within the loop body, for the following iterations. At the last iteration, the last produced values will be disseminated to the code segment following the loop body. Within the loop body, program order holds.

for iter = 1 to MAX_ITER
  foreach i in workers
    (lb[i], rb[i]) <- workers[i].compute(lb[i+1], rb[i-1]);
  end-foreach
end-for

One special case is when one statement’s produced parameter and consumed parameter overlaps. It must be noted that there is no dependency within the same foreach statement. In the above code segment, the values consumed lb[i], rb[i] by worker[i] will not come from its neighbors in this iteration. The rule is that the consumed values always originate from previous foreach statements or foreach statements from a previous loop iteration, and the published values are visible only to following foreach statements or foreach statements in following loop iterations.

\(\bullet\) Scatter and Gather Operation

A collection of values produced by one object may be split and consumed by multiple object array elements for a scatter operation. Conversely, a collection of values from different objects can be gathered to be consumed by one object.

foreach i in A
  (points[i,*]) <- A[i].f(...);
end-foreach
foreach k,j in B
  (...) <- B[k,j].g(points[k,j]);
end-foreach

A wildcard dimension * in A[i].f()’s output points specifies that it will publish multiple data items. At the consuming side, each B[k,j] consumes only one point in the data, and therefore a scatter communication will be generated from A to B. For instance, A[1] will publish data points[1,0..N-1] to be consumed by multiple array objects B[1,0..N-1].

foreach i,j in A
  (points[i,j]) <- A[i,j].f(...);
end-foreach
foreach k in B
  (...) <- B[k].g(points[*,k]);
end-foreach

Similar to the scatter example, if a wildcard dimension * is in the consumed parameter and the corresponding published parameter does not have a wildcard dimension, there is a gather operation generated from the publishing statement to the consuming statement. In the following code segment, each A[i,j] publishes a data point, then data points from A[0..N-1,j] are combined together to for the data to be consumed by B[j].

Many communication patterns can be expressed with combination of orchestration statements. For more details, please refer to PPL technical report 06-18, “Charisma: Orchestrating Migratable Parallel Objects”.

Last but not least, all the orchestration statements in the .or file together form the dependency graph. According to this dependency graph, the messages are created and the parallel program progresses. Therefore, the user is advised to put only parallel constructs that are driven by the data dependency into the orchestration code. Other elements such as local dependency should be coded in the sequential code.

1.2.2. Sequential Code

1.2.2.1. Sequential Files

The programmer supplies the sequential code for each class as necessary. The files should be named in the form of class name with appropriate file extension. The header file is not really an ANSI C header file. Instead, it is the sequential portion of the class’s declaration. Charisma will generate the class declaration from the orchestration code, and incorporate the sequential portion in the final header file. For example, if a molecular dynamics simulation has the following classes (as declared in the orchestration code):

class MDMain : MainChare;
class Cell : ChareArray3D;
class CellPair : ChareArray6D;

The user is supposed to prepare the following sequential files for the classes: MDMain.h, MDMain.C, Cell.h, Cell.C, CellPair.h and CellPair.C, unless a class does not need sequential declaration and/or definition code. Please refer to the example in the Appendix.

For each class, a member function void initialize(void) can be defined and the generated constructor will automatically call it. This saves the trouble of explicitly call initialization code for each array object.

1.2.2.2. Producing and Consuming Functions

The C/C++ source code is nothing different than ordinary sequential source code, except for the producing/consuming part. For consumed parameters, a function treats them just like normal parameters passed in. To handle produced parameters, the sequential code needs to do two special things. First, the function should have an extra parameter for output parameters. The parameter type is keyword outport, and the parameter name is the same as appeared in the orchestration code. Second, in the body of the function, the keyword produce is used to connect the orchestration parameter and the local variables whose value will be sent out, in a format of a function call, as follows.

produce(produced_parameter, local_variable[, size_of_array]);

When the parameter represents a data array, we need the additional size_of_array to specify the size of the data array.

The dimensionality of an orchestration parameter is divided into two parts: its dimension in the orchestration code, which is implied by the dimensionality of the object arrays the parameter is associated, and the local dimensionality, which is declared in the declaration section. The orchestration dimension is not explicitly declared anywhere, but it is derived from the object arrays. For instance, in the 1D Jacobi worker example, lb and rb has the same orchestration dimensionality of workers, namely 1D of size 16. The local dimensionality is used when the parameter is associated with local variables in sequential code. Since lb and rb are declared to have the local type and dimension of double [512], the producing statement should connect it with a local variable of double [512].

void JacobiWorker::produceBorders(outport lb, outport rb) {
  ...
  produce(lb, localLB, 512);
  produce(rb, localRB, 512);
}

Special cases of the produced/consumed parameters involve scatter/gather operations. In scatter operation, since an additional dimension is implied in the produced parameter, we the local_variable should have additional dimension equal to the dimension over which the scatter is performed. Similarly, the input parameter in gather operation will have an additional dimension the same size of the dimension of the gather operation.

For reduction, one additional parameter of type char[] is added to specify the reduction operation. Built-in reduction operations are + (sum), * (product), < (minimum), > (maximum) for basic data types. For instance the following statements takes the sum of all local value of result and for output in sum.

reduce(sum, result, "+");

If the data type is a user-defined class, then you might use the function or operator defined to do the reduction. For example, assume we have a class called Force, and we have an add function (or a + operator) defined.

Force& Force::add(const Force& f);

In the reduction to sum all the local forces, we can use

reduce(sumForces, localForce, "add");

1.2.2.3. Miscellaneous Issues

In sequential code, the user can access the object’s index by a keyword thisIndex. The index of 1-D to 6-D object arrays are:

1D: thisIndex
2D: thisIndex.{x,y}
3D: thisIndex.{x,y,z}
4D: thisIndex.{w,x,y,z}
5D: thisIndex.{v,w,x,y,z}
6D: thisIndex.{x1,y1,z1,x2,y2,z2}

1.3. Building and Running a Charisma Program

There are two steps to build a Charisma program: generating Charm++ program from orchestration code, and building the Charm++ program.

1) Charisma compiler, currently named orchc, is used to compile the orchestration code (.or file) and integrate sequential code to generate a Charm++ program. The resultant Charm++ program usually consists of the following code files: Charm++ Interface file ([modulename].ci), header file ([modulename].h) and C++ source code file ([modulename].C). The command for this step is as follows.

$ orchc [modulename].or

2) Charm++ compiler, charmc, is used to parse the Charm++ Interface (.ci) file, compile C/C++ code, and link and build the executable. The typical commands are:

$ charmc [modulename].ci
$ charmc [modulename].C -c
$ charmc [modulename].o -o pgm -language charm++

Running the Charisma program is the same as running a Charm++ program, using Charm++’s job launcher charmrun (on some platforms like CSE’s Turing Cluster, use the customized job launcher rjq or rj).

$ charmrun pgm +p4

Please refer to Charm++’s manual and tutorial for more details of building and running a Charm++ program.

1.4. Support for Library Module

Charisma is capable of producing library code for reuse with another Charisma program. We explain this feature in the following section.

1.5. Writing Module Library

The programmer uses the keyword module instead of program in the header section of the orchestration code to tell the compiler that it is a library module. Following keyword module is the module name, then followed by a set of configuration variables in a pair parentheses. The configuration variables are used in creating instances of the library, for such info as problem size.

Following the first line, the library’s input and output parameters are posted with keywords inparam and outparam.

module FFT3D(CHUNK, M, N);
inparam indata;
outparam outdata1, outdata2;

The body of the library is not very different from that of a normal program. It takes input parameters and produces out parameters, as posted in the header section.

1.6. Using Module Library

To use a Charisma module library, the programmer first needs to create an instance of the library. There are two steps: including the module and creating an instance.

use FFT3D;
library f1 : FFT3D(CHUNK=10, M=10, N=100);
library f2 : FFT3D(CHUNK=8, M=8, N=64);

The keyword use and the module name includes the module in the program, and the keyword library creates an instance with the instance name, followed by the module name with value assignment of configuration variables. These statements must appear in the declaration section before the library instance can be used in the main program’s orchestration code.

Invoking the library is like calling a publish statement; the input and output parameters are the same, and the object name and function name are replaced with the library instance name and the keyword call connected with a colon.

(f1_outdata[*]) <- f1:call(f1_indata[*]);

Multiple instances can be created out of the same module. Their execution can interleave without interfering with one another.

1.7. Using Load Balancing Module

1.7.1. Coding

To activate the load balancing module and prepare objects for migration, there are 3 things that need to be added in Charisma code.

First, the programmer needs to inform Charisma about load balancing with a define ldb; statement in the header section of the orchestration code. This will make Charisma generate extra Charm++ code to do load balancing such as PUP methods.

Second, the user has to provide a PUP function for each class with sequential data that needs to be moved when the object migrates. When choosing which data items to pup, the user has the flexibility to leave the dead data behind to save on communication overhead in migration. The syntax for the sequential PUP is similar to that in a Charm++ program. Please refer to the load balancing section in Charm++ manual for more information on PUP functions. A typical example would look like this in user’s sequential .C file:

void JacobiWorker::sequentialPup(PUP::er& p){
  p|myLeft; p|myRight; p|myUpper; p|myLower;
  p|myIter;
  PUParray(p, (double *)localData, 1000);
}

Thirdly, the user will make the call to invoke load balancing session in the orchestration code. The call is AtSync(); and it is invoked on all elements in an object array. The following example shows how to invoke load balancing session every 4th iteration in a for-loop.

for iter = 1 to 100
  // work work
  if (iter % 4 == 0) then
    foreach i in workers
      workers[i].AtSync();
    end-foreach
  end-if
end-for

If a while-loop is used instead of for-loop, then the test-condition in the if statement is a local variable in the program’s MainChare. In the sequential code, the user can maintain a local variable called iter in MainChare and increment it every iteration.

1.7.2. Compiling and Running

Unless linked with load balancer modules, a Charisma program will not perform actual load balancing. The way to link in a load balancer module is adding -module EveryLB as a link-time option.

At run-time, the load balancer is specified in command line after the +balancer option. If the balancer name is incorrect, the job launcher will automatically print out all available load balancers. For instance, the following command uses RefineLB.

$ ./charmrun ./pgm +p16 +balancer RefineLB

1.8. Handling Sparse Object Arrays

In Charisma, when we declare an object array, by default a dense array is created with all the elements populated. For instance, when we have the following declaration in the orchestration code, an array of NxNxN is created.

class Cell : ChareArray3D;
obj cells : Cell[N,N,N];

There are certain occasions when the programmer may need sparse object arrays, in which not all elements are created. An example is neighborhood force calculation in molecular dynamics application. We have a 3D array of Cell objects to hold the atom coordinates, and a 6D array of CellPair objects to perform pairwise force calculation between neighboring cells. In this case, not all elements in the 6D array of CellPair are necessary in the program. Only those which represent two immediately neighboring cells are needed for the force calculation. In this case, Charisma provides flexibility of declaring a sparse object array, with a sparse keyword following the object array declaration, as follows.

class CellPair : ChareArray6D;
obj cellpairs : CellPair[N,N,N,N,N,N],sparse;

Then the programmer is expected to supply a sequential function with the name getIndex_ARRAYNAME to generate a list of selected indices of the elements to create. As an example, the following function essentially tells the system to generate all the NxNxNxNxNxN elements for the 6D array.

void getIndex_cellpairs(std::vector<CkArrayIndex6D>& vec) {
  int i,j,k,l,m,n;
  for(i=0;i<N;i++)
    for(j=0;j<N;j++)
      for(k=0;k<N;k++)
        for(l=0;l<N;l++)
          for(m=0;m<N;m++)
            for(n=0;n<N;n++)
              vec.push_back(CkArrayIndex6D(i,j,k,l,m,n));
}

1.9. Example: Jacobi 1D

Following is the content of the orchestration file jacobi.or.

program jacobi

class  JacobiMain : MainChare;
class  JacobiWorker : ChareArray1D;
obj  workers : JacobiWorker[M];
param  lb : double[N];
param  rb : double[N];

begin
    for iter = 1 to MAX_ITER
     foreach i in workers
         (lb[i], rb[i]) <- workers[i].produceBorders();
         workers[i].compute(lb[i+1], rb[i-1]);
     end-foreach
    end-for
end

The class JacobiMain does not need any sequential code, so the only sequential code are in JacobiWorker.h and JacobiWorker.C. Note that JacobiWorker.h contains only the sequential portion of JacobiWorker’s declaration.

#define N 512
#define M 16

int currentArray;
double localData[2][M][N];
double localLB[N];
double localRB[N];
int myLeft, myRight, myUpper, myLower;

void initialize();
void compute(double lghost[], double rghost[]);
void produceBorders(outport lb, outport rb);
double abs(double d);

Similarly, the sequential C code will be integrated into the generated C file. Below is part of the sequential C code taken from JacobiWorker.C to show how consumed parameters (rghost and lghost in JacobiWorker::compute) and produced parameters (lb and rb in JacobiWorker::produceBorders) are handled.

void JacobiWorker::compute(double rghost[], double lghost[]) {
    /* local computation for updating elements*/
}

void JacobiWorker::produceBorders(outport lb, outport rb) {
    produce(lb, localData[currentArray][myLeft], myLower-myUpper+1);
    produce(rb, localData[currentArray][myRight], myLower-myUpper+1);
}

The user compile these input files with the following command:

$ orchc jacobi.or

The compiler generates the parallel code for sending out messages, organizing flow of control, and then it looks for sequential code files for the classes declared, namely JacobiMain and JacobiWorker, and integrates them into the final output: jacobi.h, jacobi.C and jacobi.ci, which is a Charm++ program and can be built the way a Charm++ program is built.