October 29, 2004

Meeting Minutes - October 29, 2004

F@C


    Diagrams - Make three on one, with one legend. JoshM has notes with the details.

    Our poster was accepted at SIAM CS05.

    Testing


      John will start working with the A2 release (in binary form still) this weekend. Consider building from source and collecting more detailed data after SC04.

      Midwife design in Pittsburgh.

    Code inspection - print 2 up 2 sided 4 copies on Friday before leaving for SC04. Do the inspection in Pittsburgh over food and wine.

    A2 release - most items complete, we may get one or two more done, see source/TODO for the details.

Numerical Methods


    Testing with source builds of F@C (after SC04). See paper notes for the details. John.

    C code to PeterB. Charlie.

    Publish the material that PeterB collected. Charlie.

    Re-organizational meeting soon after SC04.

Papers and Presentations


    F@C, 1/sqrt(x), and the education program were all accepted at SIAM CS05

    Merck - November 16th in Noyes Hall. Presenting B-and-T-GROMACS, JoshH is going to re-print the poster. JoshM will submit the abstract to Nathan.

Plumbing


    Switch for cairo or a UPS? We need to make this decision soon. Start a large load on cairo with F@C (JoshM) and see if we can start making the switches crash again. If so, try upgrading the firmware to see if that stops it.

    Ordered a full-duplex speaker phone of the ReCompute/Cluster lab space.

    Setup fatc-dev and fatc-user @ cs.earlham.edu. Charlie will speak to Skylar.

    Is Bugzilla ready to go? Right list of platforms? JoshM

    Update the node usage list. JoshM

Posted by charliep at 10:45 AM

October 28, 2004

Update

Folding@Clusters

grompp error codes
Gave all fatal_error() calls in gromacs a unique error number > 0. All calls were given error numbers because grompp includes most of the libraries in gromacs. This buys some flexibility with other error dectection. A list of the error numbers, their text message, and the file in which they reside can be found in folding-at-clusters/documentation/gromacs-errno.txt

mdrun exit
first attemp:
use #define F_at_C's to block out calls to fatal_error. Instead, use fatc_exit() to get error code, log message, shutdown mpi, and cleanly exit. The fatc_exit() should be in child.{h,c}. This works because child and mdrun are the same binary.

second attemp:
-Static linking problems. CHILD symbol wasn't found. Now we will set CHILD global to TRUE in mdrun.c and set to FALSEin grompp. Make exit.o to link with mdrun and grompp that holds the F_AT_C_exit(). Test by keeping grompp executable and rebuilding child (and thus mdrun).
-Created exit.{h,c,o} to house F_AT_C_exit(). Added target to Makefile.
-exit.o needs to be linked with gromacs via configure/make
-after linking, F_AT_C_exit() needs to be put in fatal.c

Purdue FATC Image
folding-at-clusters/posters/fatc_diagram{,2}.{fig,png}.

GMXLIB -> FATCLIB
-changed getenv("GMXLIB") to getenv("FATCLIB") in gmxlib/futil.c and kernel/topio.c.
-it works and our documentation was altered to reflect the change.

Updated source/TODO.

General
-Registered for SC2004

Posted by mccoyjo at 06:26 PM

October 22, 2004

meeting notes

Folding@Clusters

Gromacs output is complete save a few stderr prints. John may need to change how he is testing to accomidate.

Signal issues have been taken care of.

We would like some testing before releasing a2.

Grompp error return codes are coming along nicely.

Work still needs to be done on mdrun error return codes. Who is handling this?

TODO needs to be updated.

GMXLIB -> FATCLIB is claimed by Joshh.

Get F@C running under gentoo. Joshh

Testing

John's testing has found no major issues. He still cannot run molecules with .ndx files. It seems that almost all of the molecules work on the vast majority of configurations.

Fixed villin-urea files (the itp files were missing).

Testing will continue after a2 source is finished and before a2 is released.

Posted by mccoyjo at 06:06 PM

October 19, 2004

MP Lite Analysis

#include http://www.earlham.edu/~charliep/mt/archives/002762.html

There is value in converting to MP Lite since it is much easier to install than LAM/MPI.

FFTW uses MPI functions that MP Lite doesn't support. This would limit this port to a subset of the analysis methods currently used by GROMACS.

MP Lite is limited to a single communicator, MPI_COMM_WORLD. F@C currently uses three communicators.

MP Lite requires either a shared filesystem or installation of the application binaries on all the nodes. The latter is a problem we were able to solve in F@C by using LAMs ability to launch a binary from the rank 0 node on all the other nodes in a MPI world (it just ships the binary to each LAM daemon before startup). There is that chicken and egg problem though of having to install LAM on each node, or a shared filesystem with the LAM binaries.

Posted by charliep at 05:57 PM | Comments (0)

October 16, 2004

A story about stdio redirection in LAM-MPI

Once upon a time in the mythical land of red and white called Indiana University, there was a project called LAM-MPI. LAM-MPI does some magic with large collections of machines including letting them all talk from the same mouth, which is located at the head node. Here are some notes about this multi-mouthed daemon.
-------------
man mpirun
-------------
* LAM  directs  UNIX standard input to /dev/null on all remote nodes.
* LAM directs UNIX standard output and error to the LAM daemon on all re-
  mote  nodes.   LAM ships all captured output/error to the node that in-
  voked mpirun and prints it on the standard output/error of mpirun.  Lo-
  cal  processes inherit the standard output/error of mpirun and transfer
  to it directly.

------------
/cluster/cairo/src/lam-7.0.6/HISTORY
------------
* stdout/stderr of the local lamd is left open so that tstdio(3) will work properly
  - tstdio -> trillium stdio file

------------
Useful files to look at
------------
share/kreq/clientio.c
otb/mpirun/mpirun.c
 - set_stdio()
   - lam_mktmpid
     - Create a temporaty file name based on an id [/tmp/lam-12]
   - lam_lfopenfd
     - sfh_send_fd 
       - pass a single file descriptor over a stream
share/include/kio.h

Instead of looking into a LAM-MPI function to do stdio redirection, why not just remap the stdio stuff ourselves via: dup, freopen.
The source below does what we want it to.: /cluster/home/joshh/src/c/reopen.c
#include 

int main(){
    char str[256] = "This is a string of text\n";
    FILE *fp;

    // Print to stdout                                                                    
    printf("%s",str);

    // redirect stdout                                                                    
    if( (fp = freopen("file.txt","w",stdout)) == NULL){
        perror("Unable to open file.txt:");
        return 1;
    }
    // try to print to stdout again                                                       
    // This goes into "file.txt" directly, and is NOT printed to the terminal
    printf("%s",str);

    return 0;
}

So the plan of action that I propose is:
  1. No children or nannies write to logs, they only use stdout, stderr
  2. Mother collects all of this via LAM-MPI magical pass through
  3. After MPI_INIT, redirect the stdout and stderr file descriptors to our COSM log file
  4. Now we have a central log with everything the would have been printed to the screen.
Questions:
  • What does the COSM log file buy us?
  • Is it easier to just ditch the COSM log file stuff and do streight v3PrintA's and at the top of the mother redirect stdout/stderr to a file?
  • Does COSM do some magic on the backend that may invalidate this option?
Posted by hursejo at 10:56 AM | Comments (88)

October 15, 2004

Meeting Minutes - October 15, 2004

F@C


    Fixed the signal overlap problem between COSM, GROMACS, and F@C. JoshM.

    Lost signal problem between nannies and children, fix by changing to a polling architecture. JoshM.

    New exit() architecture and returned error codes for mdrun and grompp. JoshM.

    printf() statements in grompp and mdrun. JoshH will look at this and then get in touch with Charlie to discuss solutions. JoshM suggested we check-out COSMs distributed file system as a potential solution to this. Would this be helpful for the logging? Maybe use mdrun/child and grompp logging to catch output, with useful return codes we just need this to put in a log.

    Next release will be a2, probably early next week (after next round of bug fixes and testing). Charlie.

    Vijay call. Charlie will talk to him about specifying input and output files sometime next week. Probably using a conf file that describes what to look for and what to generate.

    New molecules, JoshM will get the ones in ~pande and set them up.

SC04


    Hotel for Fri-Tue night, JoshM, JoshH, and Charlie.

    John is looking into travel.

    Take the projector, screen, CS department sab rep ads.

Numerical Methods


    Compare same molecule with GROMACS and F@C to see what the overhead is. John and JoshM.

    Record testing results in folding-at-clusters/testing/umm.html. This same directory should have any scripts, documentation, etc. related to testing as well. John.

    Check-out a particular tagged release of F@C and build from source to do the performance analysis (rather than using GROMACS directly as before). John.

    Generated 1/sqrt(x) C code for PeterB. Charlie.

Merk


    November 16th, 7p to 9p.

    Reveiw B-and-T GROMACS poster, JoshH will re-print in Bloomington and bring to Richmond.

    Use F@C materials from SC04.

Plumbing


    Skylar will take b16 and make hopperprime.

Posted by charliep at 11:53 AM | Comments (42)

October 12, 2004

Conversation with Vijay Pande

ACS Funding


    Vijay will look into the possibility of working remotely with a couple of on-site visits for next summer, with support from the ACS.

F@C


    When Vijay and his colleauges have done more testing we'll talk about the interface between F@C and F@H's assignment server. Likely to be a conf file that describes all the input files (including topology information) provided in the system and the output files the scientist wants. This could also be the way that specific command line options for grompp and mdrun/child are passed from scientists to F@C.

    Vijay is having lunch with Adam Begerg (now a grad school student in CS at Stanford), he's going to see if Adam has time/interest on working on COSM, etc.

    MP-Lite would greatly simplify the installation of F@C. We'll need to look at FFTW/MPI and see if we can build GROMACS with FFTW libraries that don't have MPI calls (it's also possible to not use FFTW, how much science would that exclude?), stdout/stderr mapping from mdrun/child to mother via LAM's filehandle mapping, and possibly other areas. See JoshH's MT entry for a start on this.

Posted by charliep at 10:52 AM | Comments (55)

October 08, 2004

Meeting Minutes - October 8, 2004

Today's meeting was cancelled due to the recent release (a1). We all have well defined tasks to work on so a meeting isn't necessary.

Posted by charliep at 11:56 AM | Comments (22)

October 01, 2004

Meeting Minutes - October 1, 2004

Charlie has the paper notes from this meeting.

Posted by charliep at 03:14 PM | Comments (11)