B and T GROMACS Project

Description:
Benchmarking and Tuning the GROMACS Molecular Dynamics Package on Beowulf Clusters

July 20, 2004

B-and-T GROMACS Minutes - July 21, 2004

Decide where to publish, Crossroads of the ACM? Guidelines, deadlines, etc. Charlie ok as an author? JoshH.

References tour. Charlie.

http://www.earlham.edu/~charliep/mt/archives/002711.html, other entries?

PDF rendered very unusually on bubba, PS ok. JoshH.

Reconcile database results for 1-1-1, 2-2-2, 4-4-4, 6-6-6, 8-8-8; bazaar and cairo (only audit cairo now but we'll ultimately need bazaar as well); villin, dppc, then proteasome;use detailed version of the scheduler; run under b-and-t-g user; use c10-13 for now; gromacs-optimal-3.2.1 and gromacs-baseline-3.2.1; rerun 1-1-1 and 4-4-4 for villin and dppc, baseline and optimal (2^3 runs total); report back. JoshM.

Confusion between Methodology and Experimental Design. Some material from each needs to move to the other. Are they parallel universes at some level? We all need to think about this a bit. J^2 and Charlie.

How to handle definitions? Journal, Zobel? JoshH.

Other Considerations merged into Results with appropriate changes. JoshH.

How to handle general citations (rather than specific ones). I don't even know if there is a commonly accepted way of doing this in scientific literature. It may be that re-reading the articles, after our prose is relatively stable, looking for specific citations is the way to handle this. JoshH.

Posted by charliep at 06:38 PM | Comments (114)

June 30, 2004

MP_Lite Notes

MP_Lite is a subset of the MPI-1 standard. Here are some of my notes from porting GROMACS from MPI to MP_Lite:


  • A shared filesystem is needed since there is no deamon running on the nodes.
  • Uses ssh to communicate by default
  • M-VIA and VIA compatable
  • I am testing initally with tcp settings, but I found this in the README:

    For workstations, type 'make tcp' and link libmplite.a into your code.
    If you are sure you won't pass messages larger than the TCP buffer
    size, you can use the synchronous version by doing 'make tcp_sync'
    which may increase performance by a few %. The TCP buffer size is reported
    in the .nodeX log files after each run.

  • FFTW needs at least the following functions:

    • MPI_Comm_dup
    • MPI_Alltoall
    • MPI_Alltoallv
    • MPI_Issend

    which are not included in the MP_Lite library. So we have to compile fftw without mpi support.
  • Load Balancing of processes is fairly streight forward since the hosts passed to the command line are used in the order you give it:

    • Command Line: mprun -np 7 -hosts c0 c1 c1 c2 c3 c3 c0 prog
    • c0 2 processes
    • c1 2 processes
    • c2 1 process
    • c3 2 processes

    Note that they are numbered by the oder in which they were given on the command line.

    • 0 -- c0
    • 1 -- c1
    • 2 -- c1
    • 3 -- c2
    • 4 -- c3
    • 5 -- c3
    • 6 -- c0

    So if we are using a ring structure we would want to group our nodes together. With GROMACS the ring is based on the MPI numbering of the nodes. so the optimal version of the command line should be as follows:

    • Command Line: mprun -np 7 -hosts c0 c0 c1 c1 c2 c3 c3 prog
    • 0 -- c0
    • 1 -- c0
    • 2 -- c1
    • 3 -- c1
    • 4 -- c2
    • 5 -- c3
    • 6 -- c3

  • More than one process can be on a single node. MP_Lite uses a different port per process for communication on a single node.

Posted by hursejo at 10:47 AM | Comments (166)

June 08, 2004

Editing Picoseconds

Hi All,
Charlie mentioned last thursday that he had edited the number of pico-seconds that the simulation ran through. How did he do that? I ask because many of the molecules sitting in the folder JoshM pointed me towards have default run times of 1000.0ps which would take days. I would really like to chop that down to 10-20ps so that I could put together my table of "the most flop consuming subroutines" with a wide selection of molecules.
Thanks.

Posted by schaejo at 01:47 PM

May 19, 2004

Scheduler outline

here is an outline of how the current scheduler works. First notw that there are two versions of the scheduler [I have forgotten why exactly] there is detailed-scheduler.pl which is the current version and the one that should be used, and there is scheduler.pl which is old and should not be used. The latter does not have the 'find the dominate inner loop' code.

  1. Grab Arguments. Most of which are files [these should always be the last set of arguments given to the program] which are placed into an array. Before the files there are some specalized flag that turn on things like switch monitoring and /proc changes
  2. for Each File
    1. If the stopping flag has been set by the singal handler [SIGUSR1 or SIGUSR2 send to the head process]. then post a mail message on how to restart and the current state. then exit.
    2. Initalize Tests
      1. Parse Config File. Here we also make the working directory and make sure we have unique path and tag names. If this is a 'duplicate' test then attach -Run-# to the end of the tag and create the directory.
      2. Make the node list. This depends upon the cluster we are running on [see notes in program] and whether we are runnig the tests as node or cpu cyclic.
      3. Set Environment Variables.
      4. Prepare result and option_profile rows in the Database
      5. Generate Run script using node list.
    3. Launch the script via nohup so we can do...
    4. Checkpointing. Wait for finish_time field to obtain a value. If the value was 1900-01-01 then post an error and quit the scheduler.
    5. Analyse Run [mark ps_real, ps_node, dominate inner loop, etc.]
    6. Cleanup variables for next run. Mail successful completion of this configuration file.
  3. Mail a Scheduler Finished message

Some General Notes:


  • I use the 'usysv' ssi flag to mpirun by default because it provides the best all around performance.
  • Therea re some heavy duty perl Regular Expressions in the analyze routines, especaly whe finding the Dominte Inner loop. If these get too much to parse use the commented out print statements in the control statements to help.
  • The general format for a directory name is: [molecule]-[tag]-[processes]on[cpus]-[nodes]

Posted by hursejo at 03:40 PM | Comments (0)

May 14, 2004

Ohio Linux Fest 2004 Submission

I submitted the following to the Ohio Linux Fest 2004:
http://cluster.earlham.edu/detail/project/b-and-t-gromacs/presentations/linux-fest-2004.html

I may try to convert this to actual HTML in the near future, but I may wait until we start pounding out some prose/presentation materials for it first, so we can define its structure.

Posted by hursejo at 07:49 PM | Comments (0)

May 05, 2004

GROMACS 3.1.x and 3.2.x AltiVec Support

In order for GROMACS (3.1.x and 3.2.x) to build and use AltiVec instructions on PowerPC chips running Yellow Dog Linux/gcc 3.3.2 there are two files in the distribution which need a header file added to them.

In configure "#include <altivec.h>" should be added before main() in the generated C code in the AltiVec support test section. You can find this by searching for "supports altivec".

In include/ppc_altivec.h "#include <altivec.h>" should be added before the first function definition.

Posted by charliep at 06:28 PM | Comments (0)

May 04, 2004

Chart of Runs

I have manually compiiled a list of the molecule runs that [have | have not | will not be] completed for both bazaar ad cairo.
Bazaar Cluster
Cairo Cluster
These are automatically updated from the database when you refresh the page. The key has changed a bit from previous iterations of this chart. I am working on a Time Approximation scheme to place on the page as well.
The pages list the runs in 4 catagories:


  • Type A:
    run = N(x) + C(x) + P(x)
  • Type B:
    run = N(x) + C(2x) + P(2x)
  • Type C:
    run = N(x) + C(x) + P(2x)
    run = N(x) + C(x) + P(2x-1)
  • Type D:
    run = N(x) + C(2x) + P(4x)
    run = N(x) + C(2x) + P(4x-1)
    run = N(x) + C(2x) + P(4x-2)
    run = N(x) + C(2x) + P(4x-3)

Where:

  • N(x) is x number of Nodes
  • C(x) is x number of Cpus
  • P(x) is x number of Processes

Posted by hursejo at 09:16 AM | Comments (1)

April 16, 2004

Villin with Urea Molecule Set

I placed the "villin and URE in 6 A cubic box in water" molecule set in the b-and-t-gromacs CVS repository with the other molecules. It is under the directory villin-urea or via the softlink 'urea'.

I have been doing some testing with this molecule, and cannot seem to get it to span more than 3 processes with out major failure [i.e. Application death]. This is a very large molecule, and if we can only use at most 3 processes with it I am interested in finding out why. This is one of those question that we need to answer in order to produce some stable F@C code. grompp is able to split it fine, but mdrun chokes. I am playing around with the other versions of GROMACS to see if there is any difference, specifically I am interested in testing with 3.2.1.

Posted by hursejo at 12:26 PM | Comments (2)

GROMCAS 3.1.4 & 3.2.1 Install

I have installed the latest stable release of GROMACS (3.2.1) on the clusters. I also installed GROMACS 3.1.4 on cairo. For both versions I installed a Baseline and an Optimal Config.

So on both clusters we have the following versions of GROMACS with both Baseline and Optimal configurations:


  • 3.1.4
  • 3.1.5_pre1
  • 3.2.0
  • 3.2.1

Posted by hursejo at 10:35 AM | Comments (0)

April 06, 2004

Future Runs

Here is a table of the Future runs that I would like to run to answer the question:

For a given molecule, what is the optimal Number of Processes, taking into consideration SMP vs Uni-processor machines running both x86 and PPC hardware [Bazaar and Cairo respectivly]?

The file is here:
chart.html

Posted by hursejo at 10:22 AM | Comments (6)

Reading SMP

Here are some notes about reading SMP vs Uniprocessor runs in the Database.

SMP Collection


cpus | nodes | processes | label | molecule | cluster_name | finish_time
------+-------+-----------+---------------------------+----------+--------------+---------------------
2 | 1 | 2 | Gromacs-SMP-Optimal-3.2.0 | villin | bazaar | 2004-02-24 21:52:37
4 | 2 | 4 | Gromacs-SMP-Optimal-3.2.0 | villin | bazaar | 2004-03-02 16:24:51
6 | 3 | 6 | Gromacs-SMP-Optimal-3.2.0 | villin | bazaar | 2004-03-02 16:10:04
8 | 4 | 8 | Gromacs-SMP-Optimal-3.2.0 | villin | bazaar | 2004-03-02 15:58:00
2 | 1 | 2 | Gromacs-SMP-Optimal-3.2.0 | villin | cairo | 2004-02-24 20:44:34
4 | 2 | 4 | Gromacs-SMP-Optimal-3.2.0 | villin | cairo | 2004-03-02 20:59:57
6 | 3 | 6 | Gromacs-SMP-Optimal-3.2.0 | villin | cairo | 2004-03-02 20:51:44
8 | 4 | 8 | Gromacs-SMP-Optimal-3.2.0 | villin | cairo | 2004-03-02 20:45:20

Uni Processor Collection

cpus | nodes | processes | label | molecule | cluster_name | finish_time
------+-------+-----------+-------------------------------------+----------+--------------+---------------------
2 | 2 | 2 | Gromacs-Optimal-Configuration-3.2.0 | villin | bazaar | 2004-01-20 16:28:06
4 | 4 | 4 | Gromacs-Optimal-Configuration-3.2.0 | villin | bazaar | 2004-01-20 16:06:36
6 | 6 | 6 | Gromacs-Optimal-Configuration-3.2.0 | villin | bazaar | 2004-01-20 15:50:56
8 | 8 | 8 | Gromacs-Optimal-Configuration-3.2.0 | villin | bazaar | 2004-01-20 15:37:34
2 | 2 | 2 | Gromacs-Optimal-Configuration-3.2.0 | villin | cairo | 2004-01-17 20:40:55
4 | 4 | 4 | Gromacs-Optimal-Configuration-3.2.0 | villin | cairo | 2004-01-17 20:27:23
6 | 6 | 6 | Gromacs-Optimal-Configuration-3.2.0 | villin | cairo | 2004-01-17 20:18:47
8 | 8 | 8 | Gromacs-Optimal-Configuration-3.2.0 | villin | cairo | 2004-01-17 20:11:57

The difference is in the conbination of nodes, and processes. In the Uniprocessor runs nodes = processes, in SMP (Dual CPU) runs nodes = processes/(cpus per node) or nodes = processes/2.

Note that in these runs cpus = processes, but this may not be so in the furture. This is only true because we only tested by running one process per cpu, but we may find that running more than 1 processes on a cpu is the optimal configuration.

I am going to setup some runs on Cairo and Bazaar to fill out our table.

Posted by hursejo at 07:59 AM | Comments (0)