Folding@Cluster Project

Description:
Folding@Cluster

January 14, 2005

Design Meeting - Jan. 14, 2005

  • TODO File: see file for updates.
  • protocol.txt: revisit next week
  • Molecule configuration file: (itp files must go in top/, only used by grompp/preprocessing phase) - JoshH
    • GRO_FILE (mother & child)
    • MDP_FILE (mother & child)
    • TOP_FILE (mother & child)
    • ITP_FILES (mother) CSV [Optional] (top)
    • NDX_FILES (mother) CSV [Optional]
    • MAX_NODES_CLUSTER [Optional] appschema
    • PROCESSES_PER_NODE_CLUSTER [Optional] -np
    • MAX_NODES_SMP [Optional] appschema
    • PROCESSES_PER_NODE_SMP [Optional] -np
  • CPU Count - Charlie
  • Progress Meter - joshh
Posted by hursejo at 12:45 PM | Comments (0)

January 02, 2005

Design Notes - process architecture

This is a description of the new process architecture developed after our experiences working with the a1 release.

Overall Plan:


    Mother MPI_Spawn()'s Nannies
    Mother MPI_Spawn()'s mdrun (we have removed the child, and just have
    mdrun)

Mother:


    1. Spawn Nannies, 1 per node in mother.conf
    2. Capability Discovery with Nannies
    3. Make mdrun<->Nanny assignments

      a) Get PID and Hostname information from mdrun (Init_FATC())
      b) Make nanny/mdrun assignments and distribute to nannies
      c) Reap any unused nannies
      d) Tell each nanny the # of children assigned to them

    4. Run grompp
    5. Spawn mdrun
    6. Collect periodic checkpoint files from nanny0
    7. When mdrun completes

      a) Completioin of mdrun is indicated by mdrun0 sending a message to the mother. This message will pass the exit code (sucess or flavor of failure).
      b) Nanny0 will send all the necessary files to the mother

    8. Reap all nannies
    9. Report result to F@C server, get a new molecule, and restart with the new molecule

Nanny:


    1. Get # of children to look for with PID information from mother
    2. When the checkpoint file is updated nanny0 will send it to the mother
    3. When a nanny checkpoints/checks-in-with it's mdrun process it will
    compare the cpu time from the last checkpoint with the cpu time from this
    checkpoint and

      a) if it has not changed then it will report the stale state to the
      mother
      b) if the process goes away then report that to the mother.

    *Still not sure how to do this in an elegant way.
    5. When mdrun finishes [mother tells all the nannies when this happens] nanny0 will send all of the files to the mother

mdrun:


    0. The mdrun binary will be renamed to fatc_child as part of "$ make release")
    1. No source code changes except:

    • stderr -> stdout
    • error codes instead of exit()
    • Init_FATC code for PID/hostname communication, and freopen. This is called just after MPI_Init by all mdrun processes.
    • Finalize_FATC code for "finished" message to the mother. This is called just before MPI_Finalize by all mdrun processes

Notes:


    1. The child.[c,h] files will be moved to folding-at-clusters/source/old

    2. We want to limit the changes we make in mdrun, but script based changes that are easy to apply are ok.

    3. There are some kludges in the way that the nanny 'finds' the mdrun
    process it is matched with. There are better ways to do this, but for the
    moment the kludges allow for a proof of concept and quick solution.

    4. We are using MPI_Spawn() instead of system(mpirun ...) because the former
    allows a bit more control over the MPI_COMM group for the mdrun process(s),
    whereas the latter completely separates the processes and adds some more
    challenges that are harder to overcome.


Questions:


    1. Set Nice/ProcessPriority level
    Answer: mdrun already has a command line option for setting the nice level.

    2. Redirect stdout to a log file [via freopen].
    Answer: Init_FATC function in mdrun. nannies will transmit the logs back to the mother on completion.

    3. Get Work Unit from mother [tpr, gro(?) files]
    Answer: Mother pre-populates files on the nanny0 node as JoshH suggested.
    This works if nanny0 and the mother are on different nodes and on NFS and
    non-NFS systems.

    4. Notification to the mother that we are finished.
    Answer: Finialize_FATC sends message to mother from mdrun0

Posted by charliep at 06:01 PM | Comments (0)

December 28, 2004

Conversation with Vijay Pande

o The importance of SMP resources is increasing, make sure we install easily and scale well in this environment. What is the current level of support for threads in GROMACS? How does GROMACS running threaded compare in terms of performance to F@C? The CCG will locate SMP boxes that we can test on.

(Postscript: There is only partial support for threads in GROMACS currently (non-bonded interactions), F@C, ie MPI, scales pretty well on SMP boxes.)

o Ease of installation, what can we do to make it as simple as possible to install and use F@C? LAM is a hurdle in this respect. Simple how-to documents for ssh (assumed), key exchange, LAM, using shared file systems, other topics?

o Does hyperthreading do anything for F@C?

(Postscript: Not really. HT is duplicated processor state resources which facilitate very fast context switches between threads in a single process. Since GROMACS doesn't use threads F@C can't leverage this.)

o Windows port, Vijay has a person that can work on this with us.

Posted by charliep at 02:43 PM | Comments (0)

November 10, 2004

Code Review Notes

NOTES: 11/10/2004
  • separate mdrun from child (like grompp)
    • Start with latest version of GROMACS.
    • keep define f-at-c (for stderr->stdout, verbosity of prints...)
    • Patch for altivec
    • Change build script
    • Take out include of gromacs header for f-at-c
  • Remove HTTPD from mother
    • Turn this into MPI async communication.
    • Turn off Threads in COSM
    • No longer need to send the mother hostname and port to the nanny
  • CHILD_MULT should be dynamic (load balancing)
  • File System Recourses needs to be altered for dynamic files per pande stuff
  • Documentation:
    • Header file has complete documentation for functions, and globals
    • Use special tags for start and stop of comments.
  • Print function: Reconcile N-different.
  • Remove debug counter [duplicate: debugCounter, debugTimes??]
  • Functional abstraction in main() - mother/nanny/child
  • MPI Send and Recv file function()
  • Remove OpenLog()
    • Now we can use Print everywhere and should!
  • Stdout redirection toggle [log | display]
  • Route progress of child
    • caught by nanny
    • sent to the mother async. [disjoint from checkpointing mech.]
  • Clear documentation of *all* variables.
  • Trap, check, & react - return codes from all functions when appropriate.
  • Use PrintHeader(x,x) in [-v] version return.
  • Write a PraseSpec function to get applicable file names from assignment server.
  • Pre-while loop set child_process
  • Where is nanny_host_conf_file set in mother.c
  • convert int -> %u to something reasonable (signed)
  • If conf file is not set properly, loose default values pre spawn.
  • When waiting for the child to return a result, should use Irecv instead of Recv
  • check_code() function may be redundant.
  • "Recover #" print messages, should be more meaningful.
  • "hold" variable may be lost since it doesn't *normally* do anything meaningful.
  • "flag" should be equal to some defines in the header. clarity issue
  • Testing Rubic Item: Test stopping mother with a signal
Posted by hursejo at 10:29 PM | Comments (0)

October 19, 2004

MP Lite Analysis

#include http://www.earlham.edu/~charliep/mt/archives/002762.html

There is value in converting to MP Lite since it is much easier to install than LAM/MPI.

FFTW uses MPI functions that MP Lite doesn't support. This would limit this port to a subset of the analysis methods currently used by GROMACS.

MP Lite is limited to a single communicator, MPI_COMM_WORLD. F@C currently uses three communicators.

MP Lite requires either a shared filesystem or installation of the application binaries on all the nodes. The latter is a problem we were able to solve in F@C by using LAMs ability to launch a binary from the rank 0 node on all the other nodes in a MPI world (it just ships the binary to each LAM daemon before startup). There is that chicken and egg problem though of having to install LAM on each node, or a shared filesystem with the LAM binaries.

Posted by charliep at 05:57 PM | Comments (0)

October 16, 2004

A story about stdio redirection in LAM-MPI

Once upon a time in the mythical land of red and white called Indiana University, there was a project called LAM-MPI. LAM-MPI does some magic with large collections of machines including letting them all talk from the same mouth, which is located at the head node. Here are some notes about this multi-mouthed daemon.
-------------
man mpirun
-------------
* LAM  directs  UNIX standard input to /dev/null on all remote nodes.
* LAM directs UNIX standard output and error to the LAM daemon on all re-
  mote  nodes.   LAM ships all captured output/error to the node that in-
  voked mpirun and prints it on the standard output/error of mpirun.  Lo-
  cal  processes inherit the standard output/error of mpirun and transfer
  to it directly.

------------
/cluster/cairo/src/lam-7.0.6/HISTORY
------------
* stdout/stderr of the local lamd is left open so that tstdio(3) will work properly
  - tstdio -> trillium stdio file

------------
Useful files to look at
------------
share/kreq/clientio.c
otb/mpirun/mpirun.c
 - set_stdio()
   - lam_mktmpid
     - Create a temporaty file name based on an id [/tmp/lam-12]
   - lam_lfopenfd
     - sfh_send_fd 
       - pass a single file descriptor over a stream
share/include/kio.h

Instead of looking into a LAM-MPI function to do stdio redirection, why not just remap the stdio stuff ourselves via: dup, freopen.
The source below does what we want it to.: /cluster/home/joshh/src/c/reopen.c
#include 

int main(){
    char str[256] = "This is a string of text\n";
    FILE *fp;

    // Print to stdout                                                                    
    printf("%s",str);

    // redirect stdout                                                                    
    if( (fp = freopen("file.txt","w",stdout)) == NULL){
        perror("Unable to open file.txt:");
        return 1;
    }
    // try to print to stdout again                                                       
    // This goes into "file.txt" directly, and is NOT printed to the terminal
    printf("%s",str);

    return 0;
}

So the plan of action that I propose is:
  1. No children or nannies write to logs, they only use stdout, stderr
  2. Mother collects all of this via LAM-MPI magical pass through
  3. After MPI_INIT, redirect the stdout and stderr file descriptors to our COSM log file
  4. Now we have a central log with everything the would have been printed to the screen.
Questions:
  • What does the COSM log file buy us?
  • Is it easier to just ditch the COSM log file stuff and do streight v3PrintA's and at the top of the mother redirect stdout/stderr to a file?
  • Does COSM do some magic on the backend that may invalidate this option?
Posted by hursejo at 10:56 AM | Comments (88)

October 12, 2004

Conversation with Vijay Pande

ACS Funding


    Vijay will look into the possibility of working remotely with a couple of on-site visits for next summer, with support from the ACS.

F@C


    When Vijay and his colleauges have done more testing we'll talk about the interface between F@C and F@H's assignment server. Likely to be a conf file that describes all the input files (including topology information) provided in the system and the output files the scientist wants. This could also be the way that specific command line options for grompp and mdrun/child are passed from scientists to F@C.

    Vijay is having lunch with Adam Begerg (now a grad school student in CS at Stanford), he's going to see if Adam has time/interest on working on COSM, etc.

    MP-Lite would greatly simplify the installation of F@C. We'll need to look at FFTW/MPI and see if we can build GROMACS with FFTW libraries that don't have MPI calls (it's also possible to not use FFTW, how much science would that exclude?), stdout/stderr mapping from mdrun/child to mother via LAM's filehandle mapping, and possibly other areas. See JoshH's MT entry for a start on this.

Posted by charliep at 10:52 AM | Comments (55)

September 22, 2004

Non-NFS notes

So I have tested with the Non-NFS nodes on cairo and compiled a few notes. Overall everything worked fine. I built on c1, and tested on c12-c15 with the release binaries.
  • On compute nodes we (by default) set our working directory to $HOME. So you get directories created like:
    Creating Directory /home/test1/work/nanny/
    Creating Directory /home/test1/work/nanny/
    We should allow the user to set this directory. Currently they can via mpirun -wd DIR Which will set the working directory on each remote node to the argument (DIR), before starting work.
    We could build in the capability to allow the user to set the working directory per node in the configuration.
    For the moment the mpirun option is the best for the moment, but we should consider this question a bit.
  • GROMPP Running problems/Fix:
    You must export the GMXLIB and GMXLIBDIR environment variables BEFORE starting lamd via lamboot.
    export GMXLIB=/full/path/to//release/top
    export GMXLIBDIR=/full/path/to//release/top
    lamboot -v
    This is because once lamboot is executed it sets the environment variables, and any changes or additions are not propagated until the lamd is restarted.
Posted by hursejo at 08:38 PM | Comments (131)

September 13, 2004

Building Folding-at-Clusters

There are now complete instructions for building Folding-at-Clusters in folding-at-clusters/source/README.

If you have any problems building with those get in touch with charliep.

Posted by charliep at 12:06 AM | Comments (45)

September 09, 2004

New F@C Repository Building Notes

Here are some notes that I had about building the F@C + COSM + gromacs-3.2.1:
I did all of my work on Cairo, there may be details that need to be changed for Bazaar/OSX.
  • gromacs-3.2.1 Repository
    • ./configure
      --prefix=/cluster/home/joshh/cvs/gromacs-cairo-bin/
      --enable-f_at_c
      --enable-mpi
      --enable-mpi-environment=GROMACS_MPI
      --enable-float
      --disable-software-recip
      --enable-software-sqrt
      --disable-x86-asm
      --enable-ppc-altivec
      --disable-cpu-optimization
      CPPFLAGS=-I/cluster/cairo/software/fftw-2.1.5-Baseline/include
      LDFLAGS=-L/cluster/cairo/software/fftw-2.1.5-Baseline/lib
      PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:/bin:/cluster/cairo/software/lam-7.0.2/bin:/cluster/cairo/software/lam-7.0.2/bin
    • Don't use make -j 2 it will not build out of the box (at least for me), use the single threaded version make
    • You will need to do a make install inorder to get the grompp binary (and maybe others).
  • cosm repository
    • cd cosm/v3
    • ls make
    • ./build linux-ppc -DNO_THREADS
      Why NO_THREADS ??
  • folding-at-clusters repository
    • I fixed the /cluster/project/folding-at-clusters repository so it now represents the current structure of the folding-at-clusters repository.
    • I updated the graph in the documentation
    • I added the discovery code to the source tree
    • I changed the Makefile to build the child inplace instead of in the gromacs directory.
Final Mark: Unable to link to the gromacs stuff, because it does not build properly. The problem is that we have the child now compiled in GROMACS. The child needs the defines from COSM. COSM is currently not linked into the GROMACS Makefiles (specificly the src/kernel/Makefile).
If we add the COSM stuff to GROMACS will it break GROMACS (requiring that we convert it to COSM)?
Does it break our model of seperation between COSM, folding-at-clusters, and gromacs-3.2.1 ?
Is it easier to use the old model of compiling the child withing folding-at-clusters?
Posted by hursejo at 08:32 PM | Comments (58)

August 31, 2004

Conversation with Vijay Pande

* What should we plan on sending back as a result?
science - trr, xtc (if present); last frame in trr used to generate next, xtc for analysis; mdp file controls whether an xtc file is generated.
log (unique extension) - CPU time, wall time, # restarts, cluster characteristics
8.3 file naming restrictions still apply

Distribution - package for Un*x systems? For beta do a tarball, later do packages.

Windows port - POSIX issues are handled by the C compiler

Future - can we use MPI-Lite or something similar so that we can embed it with F@C. The nanny could be a Windows Service.

Build all static binaries.

More large molecules? Ribiosome? not needed now.

* Where we are:
Capability discovery
Startup - molecule preparation (grompp), lam startup, GROMACS startup
Progress monitoring
Checkpointing
Restarting
Results

* Tracked-down 2 of the 4 compiler/optimizer errors we spoke and we have a fix for them.

* Load distribution with GROMACS and MPI coming soon. Don't let this hold-up the beta.

* How does hyper-threading affect us? Probably not at all.

* Using COSM for communication between mother and nannies with HTTP.

* Beta around the middle of the month?

Build a roadmap for the future.

How to test the quality of a GROMACS result? Not needed now.

How are we going to get molecules when in production? Folding@Home template core.

Posted by charliep at 02:54 PM | Comments (0)

August 23, 2004

restarting method for varying numbers of nodes

grompp -f mdout.mdp -c d.villin.tpr -p topol.top -e ener.edr -t d.villin.trr -np 8 -o new.villin.tpr



I believe I have found the proper gromacs tools and what to use them on for restarting a run with a different number of nodes.

tools used:

  • trjconv
  • grompp

files initially required:

  • original mdout.mdp
  • original .tpr
  • original mdout.mdp
  • result or checkpoint file (.trr)

files created:

  • new .gro
  • new .tpr

Process:

  1. Create a .gro file that incorporates the result and the topology files.
    trjconv -s original.tpr -f checkpoint.trr -o new.gro
  2. Use the .gro as input to grompp to create a new .tpr file configured for the new number of nodes.
    grompp -f mdout.mdp -c new.gro -np 4 -o new.tpr

Notes:

  • A simulation can only be ran for a set amount of time without modification. Even after restarting, the simulation cannot continue past the simulation time specified when grompp was initially executed. To extend the simulation, tpbconv must be used with the -until or -extend (which take a number of picoseconds as arguments) options:
    tpbconv -s topol.tpr -f traj.trr -e ener.edr -o new.tpr -extend 10
  • If something is wrong with this method, I do not know enough chemistry to test the accuracy effectively. Tips leading to these results were taken from the the gromacs users and developers lists.

Posted by mccoyjo at 03:03 PM | Comments (85)

June 29, 2004

Meeting Notes

General Notes


  • JoshM: Should look into MPI's native abilities for Load Balancing, Checkpointing, Recovery. Reading MPI books and articles.
  • Load Balancing
    Are we able to specify exactly the number of processes per node in the configuration? so

    • Node 0: 2 processes
    • Node 1: 5 processes
    • Node 2: 3 processes
    • Node 3: 1 processes
    • Node 4: 7 processes

    Control processes within the code, or control over the distribution pattern?
  • Checkpointing: Native to MPI or LAM specific or other?
  • Checkpointing interval should be adjustible. Able to flex from 1 hr to 5 min or something like that. Should be signal between nanny and child.
    First order solution we have a fequency of the mdp file dump. Which should appear on the mother. Nanny does all local file collections.
  • Mininal GROMACS with grompp and mdrun.
  • Add scientific core to Framework. Initally just frame, then GROMACS. Child merged with mdrun. Document changes.

Posted by hursejo at 03:11 PM | Comments (33)

May 19, 2004

Scheduler outline

here is an outline of how the current scheduler works. First notw that there are two versions of the scheduler [I have forgotten why exactly] there is detailed-scheduler.pl which is the current version and the one that should be used, and there is scheduler.pl which is old and should not be used. The latter does not have the 'find the dominate inner loop' code.

  1. Grab Arguments. Most of which are files [these should always be the last set of arguments given to the program] which are placed into an array. Before the files there are some specalized flag that turn on things like switch monitoring and /proc changes
  2. for Each File
    1. If the stopping flag has been set by the singal handler [SIGUSR1 or SIGUSR2 send to the head process]. then post a mail message on how to restart and the current state. then exit.
    2. Initalize Tests
      1. Parse Config File. Here we also make the working directory and make sure we have unique path and tag names. If this is a 'duplicate' test then attach -Run-# to the end of the tag and create the directory.
      2. Make the node list. This depends upon the cluster we are running on [see notes in program] and whether we are runnig the tests as node or cpu cyclic.
      3. Set Environment Variables.
      4. Prepare result and option_profile rows in the Database
      5. Generate Run script using node list.
    3. Launch the script via nohup so we can do...
    4. Checkpointing. Wait for finish_time field to obtain a value. If the value was 1900-01-01 then post an error and quit the scheduler.
    5. Analyse Run [mark ps_real, ps_node, dominate inner loop, etc.]
    6. Cleanup variables for next run. Mail successful completion of this configuration file.
  3. Mail a Scheduler Finished message

Some General Notes:


  • I use the 'usysv' ssi flag to mpirun by default because it provides the best all around performance.
  • Therea re some heavy duty perl Regular Expressions in the analyze routines, especaly whe finding the Dominte Inner loop. If these get too much to parse use the commented out print statements in the control statements to help.
  • The general format for a directory name is: [molecule]-[tag]-[processes]on[cpus]-[nodes]

Posted by hursejo at 03:40 PM | Comments (0)

May 12, 2004

LAM MPI and Signals

So I have been investigating why I am seeing some weird behaviour in the F@C framework when using POSIX signals. I found a couple of bits:
Signal Cataching Changes in 6.5.9 Release


Signal catching

LAM MPI now catches the signals SEGV, BUS, FPE, and ILL. The signal handler terminates the application. This is useful in batch jobs to help ensure that mpirun returns if an application process dies. To disable the catching of signals use the -nsigs option to mpirun.

Internal signal

The signal used internally by LAM has been changed from SIGUSR1 to SIGUSR2 to reduce the chance of conflicts with the Linux pthreads library. The signal used is configurable. See the installation guide for the specific ./configure flag that can be used to change the internal signal.


and this bit from mpi-forum.org

2.9.2. Interaction with Signals
MPI does not specify the interaction of processes with signals and does not require that MPI be signal safe. The implementation may reserve some signals for its own use. It is required that the implementation document which signals it uses, and it is strongly recommended that it not use SIGALRM, SIGFPE, or SIGIO. Implementations may also prohibit the use of MPI calls from within signal handlers.

In multithreaded environments, users can avoid conflicts between signals and the MPI library by catching signals only on threads that do not execute MPI calls. High quality single-threaded implementations will be signal safe: an MPI call suspended by a signal will resume and complete normally after the signal is handled.

In short if we use LAM-MPI then we should stay away from the following signals:
SEGV,BUS,FPE,ILL,TERM,USR2

I have changed the code from using SIGUSR2 to using SIGCHLD (a signal that is currently ignored by default according to signal(7) manpage), and things are working much better.

Posted by hursejo at 01:01 PM | Comments (0)

MD Packages

  • Amber costs $400 for Academic use. It does come with the source code and demos, but is only shiped via CD.
  • Tinker does not seem to have a parallel implementation that I noted on the website.
  • NAMD uses Charm++ for parallism. According to this post NAMD can be compiled with MPI support but runs a bit slower.
Since the primary goal of this project is: For a very large molecule set, which may or may not be able to fit in memory on a single machine, harness parallelism to diffuse the load and incease performace. We have some prereq's for any MD module that we consider.
  1. Must be MPI compatable to fit with our framework (What about MP_Lite?)
  2. Must have a clearly specified, plain text input file formats that other modules can adapt to easily.

The goal of this post is to start the conversation about MD modules in F@C, and the requirements that new modules must adhere to in order to be classified as a potential module.

Posted by hursejo at 09:22 AM | Comments (1)

May 05, 2004

GROMACS 3.1.x and 3.2.x AltiVec Support

In order for GROMACS (3.1.x and 3.2.x) to build and use AltiVec instructions on PowerPC chips running Yellow Dog Linux/gcc 3.3.2 there are two files in the distribution which need a header file added to them.

In configure "#include <altivec.h>" should be added before main() in the generated C code in the AltiVec support test section. You can find this by searching for "supports altivec".

In include/ppc_altivec.h "#include <altivec.h>" should be added before the first function definition.

Posted by charliep at 06:28 PM | Comments (0)

May 04, 2004

MPI Error Handler

Here is a link to the MPI Form's section on the error handler:
Comm unicator Error Handler
We need to create a function that is using the type:


typedef void MPI_Comm_errhandler_fn(MPI_Comm *, int *, ...);

I have not played with typedef'ed functions with variable arguments, so I am looking for suggestions on how to actually implement a function of the above typedef for input into this function:

int MPI_Comm_create_errhandler(MPI_Comm_errhandler_fn *function, MPI_Errhandler *errhandler)

Posted by hursejo at 11:57 AM | Comments (3)

Chart of Runs

I have manually compiiled a list of the molecule runs that [have | have not | will not be] completed for both bazaar ad cairo.
Bazaar Cluster
Cairo Cluster
These are automatically updated from the database when you refresh the page. The key has changed a bit from previous iterations of this chart. I am working on a Time Approximation scheme to place on the page as well.
The pages list the runs in 4 catagories:


  • Type A:
    run = N(x) + C(x) + P(x)
  • Type B:
    run = N(x) + C(2x) + P(2x)
  • Type C:
    run = N(x) + C(x) + P(2x)
    run = N(x) + C(x) + P(2x-1)
  • Type D:
    run = N(x) + C(2x) + P(4x)
    run = N(x) + C(2x) + P(4x-1)
    run = N(x) + C(2x) + P(4x-2)
    run = N(x) + C(2x) + P(4x-3)

Where:

  • N(x) is x number of Nodes
  • C(x) is x number of Cpus
  • P(x) is x number of Processes

Posted by hursejo at 09:16 AM | Comments (1)

April 27, 2004

Added Project 1012 to CVS

I added the Project1012 Molecule to CVS. Here is the link if you want to view the files:
http://cluster.earlham.edu/project/b-and-t-gromacs/tests/etc/orignal-molecules/project1012/

It is a bit smaller than the last molecule that we recieved, but this will 'stretch' to 100 processes with out error [in one set of testing].

Posted by hursejo at 05:38 PM | Comments (0)

April 19, 2004

Protocol for v0.5

So I have typed up an protocol outline for v0.5 which is a bit more detailed then what we have on the board now. There are some notes at the bottom which we should address when we meet next. Here is the link:
http://cluster.earlham.edu/home/joshh/dev/fac/protocal.txt

Here is a link to the current frame work:
http://cluster.earlham.edu/home/joshh/dev/fac/

Posted by hursejo at 11:59 AM | Comments (0)

April 16, 2004

Villin with Urea Molecule Set

I placed the "villin and URE in 6 A cubic box in water" molecule set in the b-and-t-gromacs CVS repository with the other molecules. It is under the directory villin-urea or via the softlink 'urea'.

I have been doing some testing with this molecule, and cannot seem to get it to span more than 3 processes with out major failure [i.e. Application death]. This is a very large molecule, and if we can only use at most 3 processes with it I am interested in finding out why. This is one of those question that we need to answer in order to produce some stable F@C code. grompp is able to split it fine, but mdrun chokes. I am playing around with the other versions of GROMACS to see if there is any difference, specifically I am interested in testing with 3.2.1.

Posted by hursejo at 12:26 PM | Comments (2)

GROMCAS 3.1.4 & 3.2.1 Install

I have installed the latest stable release of GROMACS (3.2.1) on the clusters. I also installed GROMACS 3.1.4 on cairo. For both versions I installed a Baseline and an Optimal Config.

So on both clusters we have the following versions of GROMACS with both Baseline and Optimal configurations:


  • 3.1.4
  • 3.1.5_pre1
  • 3.2.0
  • 3.2.1

Posted by hursejo at 10:35 AM | Comments (0)

April 14, 2004

MPI_Info

MPI_Comm_spawn sequentally gives processes to configured processors. For out application we would like the user to define exactly which machines this program will run on. MPI_Info can specify a file that will be used to specify such nodes. The file needs to look something like:


< MPI_Info File >
n1 -np 1 nanny
n4 -np 1 nanny
n5 -np 1 nanny
< /MPI_Info File >

Where n* is the node reference reported by lamnodes, this can also be c* to refer to specific cpus in the configuration. the -np * specifies the number of processes to start on this resource.

To abstract the user a bit from the details of this level of detail I have created a function to generate the Nanny and Child MPI_Info files from a hostfile.conf that contains a comma seperated list of nodes in the lamnodes format above. For Example:


< hostfile.conf >
n1,n4,n5
< /hostfile.conf >

It is vital to note that MPI_Comm_spawn, and MPI_Info are part of the MPI-2 standard but many implementations do not support it fully. LAM-MPI is one of the few that support both MPI_Comm_spawn and MPI_Info. However LAM-MPI does not currently have some additional functions implemented for intercommunicators. Some of these functions are MPI_Bcast and other functions that send/receive/reduce messages globally to a group. Also there is not function [that I have thus found] to poll the side of a group via an intercommunicator. The work around here is to have the first member of the Child group (rank == 0) to send its size to the mother over the intercommunicator. Man pages provide useful information about each of these commands and their limitations.

Posted by hursejo at 10:12 AM | Comments (0)

April 09, 2004

MPI_Comm_Spawn

In order to have a speerate Mother, Nanny, and Child process and play in the MPI sandbox we will need to use MPI_Comm_spawn to launch our Child and Nanny binaries from the mother. I have been playing around with this and produced a basic framework that uses this functionality. You can play wiht the files they are located here:
http://cluster.earlham.edu/home/joshh/dev/discover/dev/spawn/

Some points to mention before running to make a huge sandcastle:
1. There is an difference between intracommunicators (standard usage of MPI_COMM_WORLD) and intercommunicators. intracommunicators are used to speak with those members of your own tribe (the Children that are spawned are in a tribe all of their own, and the mother is in a seperate tribe). Intercommunicators allow tribes to talk together. So the Children need to know the handle (MPI_Comm mother) to reach the mother, and the mother needs to know the handle (MPI_Comm everyone) to reach the children.
2. The Children are able to call MPI_COMM_WORLD directly and it will allow the children to talk amongst themselves without talking to the Mother. If they want to talk to the mother they need to use the mother MPI_Comm 'channel'.

We should be able to Spawn Nannies and Children as seperate Tribes, and join them as necessary. It is possible to, after creating 1 or more intercommunicators, to join them into one big, happy intracommunicator.

To start the program you only need to start the mother, and pass it one node to start from. MPI_Comm_spawn wakes up the additional processors.
$ mpirun n0 mother

Posted by hursejo at 11:49 PM | Comments (0)

April 08, 2004

cflowd

The installation of cflowd requires the arts and GNU flex libraries. There seems to some problem with arts communicating with flex at the moment. I'll taker a closer look asap.

It seems that cflowd analyzes flow files from network communication developed by Cisco. Is that what we where looking for?

Posted by mccoyjo at 12:49 PM | Comments (1)

April 07, 2004

Capability Discovery

I cleaned up the Capability Discovery code a bit.


  • Passing structs as pointer reference arguments, reserving the return value for a status code.
  • Cleaned up some of the code (removing unused variables, and misrepresented print statements).
  • Ensured approprate 3rd party Licences are at the top of their approprate files.

These peices of code can be found here:
MPI Version
Single Version

Posted by hursejo at 04:35 PM | Comments (1)

April 05, 2004

AltiVec error

As near as I can tell the SIGTRAP error is an interaction between MPI and the AltiVec code although that doesn’t sound reasonable on the surface of it. I made a copy of discovery.c san all the MPI calls and it works fine.

I think we will have to trace down exactly where in inl1130() the trap is occurring in order to ultimately fix this. One approach would be to have our signal handler tell us where it was invoked from. Another would be to put some code in to pause stresscpu() long enough for us to attach with gdb before the offending call is made.

Posted by charliep at 11:09 PM | Comments (4)

Capability Discovery

I have finished the mother/nanny Capability Discovery Code. I have an MPI Version and a singleton version (seperate mother and nanny programs).

These are located here:
MPi Version:
http://cluster.earlham.edu/home/joshh/dev/discover/mpi/
Singleton Version:
http://cluster.earlham.edu/home/joshh/dev/discover/single/

I have a version for the x86 SSE enabled Linux systems, PPC Altivec Linux systems, and a generic Linux version. Currently the Altivec MPI Version is broken due to a Trap/Breakpoint error that I am struggling to track down.

in the directories above there is a create.pl script, which will build the approprate collection of source for your specified (on the command line) system. I need to make this into a configure script in the future.

Posted by hursejo at 01:06 PM | Comments (0)