F@C
Our poster was accepted at SIAM CS05.
Testing
Midwife design in Pittsburgh.
Code inspection - print 2 up 2 sided 4 copies on Friday before leaving for SC04. Do the inspection in Pittsburgh over food and wine.
A2 release - most items complete, we may get one or two more done, see source/TODO for the details.
Numerical Methods
C code to PeterB. Charlie.
Publish the material that PeterB collected. Charlie.
Re-organizational meeting soon after SC04.
Papers and Presentations
Merck - November 16th in Noyes Hall. Presenting B-and-T-GROMACS, JoshH is going to re-print the poster. JoshM will submit the abstract to Nathan.
Plumbing
Ordered a full-duplex speaker phone of the ReCompute/Cluster lab space.
Setup fatc-dev and fatc-user @ cs.earlham.edu. Charlie will speak to Skylar.
Is Bugzilla ready to go? Right list of platforms? JoshM
Update the node usage list. JoshM
Folding@Clusters
grompp error codes
Gave all fatal_error() calls in gromacs a unique error number > 0. All calls were given error numbers because grompp includes most of the libraries in gromacs. This buys some flexibility with other error dectection. A list of the error numbers, their text message, and the file in which they reside can be found in folding-at-clusters/documentation/gromacs-errno.txt
mdrun exit
first attemp:
use #define F_at_C's to block out calls to fatal_error. Instead, use fatc_exit() to get error code, log message, shutdown mpi, and cleanly exit. The fatc_exit() should be in child.{h,c}. This works because child and mdrun are the same binary.
second attemp:
-Static linking problems. CHILD symbol wasn't found. Now we will set CHILD global to TRUE in mdrun.c and set to FALSEin grompp. Make exit.o to link with mdrun and grompp that holds the F_AT_C_exit(). Test by keeping grompp executable and rebuilding child (and thus mdrun).
-Created exit.{h,c,o} to house F_AT_C_exit(). Added target to Makefile.
-exit.o needs to be linked with gromacs via configure/make
-after linking, F_AT_C_exit() needs to be put in fatal.c
Purdue FATC Image
folding-at-clusters/posters/fatc_diagram{,2}.{fig,png}.
GMXLIB -> FATCLIB
-changed getenv("GMXLIB") to getenv("FATCLIB") in gmxlib/futil.c and kernel/topio.c.
-it works and our documentation was altered to reflect the change.
Updated source/TODO.
General
-Registered for SC2004
Folding@Clusters
Gromacs output is complete save a few stderr prints. John may need to change how he is testing to accomidate.
Signal issues have been taken care of.
We would like some testing before releasing a2.
Grompp error return codes are coming along nicely.
Work still needs to be done on mdrun error return codes. Who is handling this?
TODO needs to be updated.
GMXLIB -> FATCLIB is claimed by Joshh.
Get F@C running under gentoo. Joshh
Testing
John's testing has found no major issues. He still cannot run molecules with .ndx files. It seems that almost all of the molecules work on the vast majority of configurations.
Fixed villin-urea files (the itp files were missing).
Testing will continue after a2 source is finished and before a2 is released.
#include http://www.earlham.edu/~charliep/mt/archives/002762.html
There is value in converting to MP Lite since it is much easier to install than LAM/MPI.
FFTW uses MPI functions that MP Lite doesn't support. This would limit this port to a subset of the analysis methods currently used by GROMACS.
MP Lite is limited to a single communicator, MPI_COMM_WORLD. F@C currently uses three communicators.
MP Lite requires either a shared filesystem or installation of the application binaries on all the nodes. The latter is a problem we were able to solve in F@C by using LAMs ability to launch a binary from the rank 0 node on all the other nodes in a MPI world (it just ships the binary to each LAM daemon before startup). There is that chicken and egg problem though of having to install LAM on each node, or a shared filesystem with the LAM binaries.
-------------
man mpirun
-------------
* LAM directs UNIX standard input to /dev/null on all remote nodes.
* LAM directs UNIX standard output and error to the LAM daemon on all re-
mote nodes. LAM ships all captured output/error to the node that in-
voked mpirun and prints it on the standard output/error of mpirun. Lo-
cal processes inherit the standard output/error of mpirun and transfer
to it directly.
------------
/cluster/cairo/src/lam-7.0.6/HISTORY
------------
* stdout/stderr of the local lamd is left open so that tstdio(3) will work properly
- tstdio -> trillium stdio file
------------
Useful files to look at
------------
share/kreq/clientio.c
otb/mpirun/mpirun.c
- set_stdio()
- lam_mktmpid
- Create a temporaty file name based on an id [/tmp/lam-12]
- lam_lfopenfd
- sfh_send_fd
- pass a single file descriptor over a stream
share/include/kio.h
#includeint main(){ char str[256] = "This is a string of text\n"; FILE *fp; // Print to stdout printf("%s",str); // redirect stdout if( (fp = freopen("file.txt","w",stdout)) == NULL){ perror("Unable to open file.txt:"); return 1; } // try to print to stdout again // This goes into "file.txt" directly, and is NOT printed to the terminal printf("%s",str); return 0; }
F@C
Lost signal problem between nannies and children, fix by changing to a polling architecture. JoshM.
New exit() architecture and returned error codes for mdrun and grompp. JoshM.
printf() statements in grompp and mdrun. JoshH will look at this and then get in touch with Charlie to discuss solutions. JoshM suggested we check-out COSMs distributed file system as a potential solution to this. Would this be helpful for the logging? Maybe use mdrun/child and grompp logging to catch output, with useful return codes we just need this to put in a log.
Next release will be a2, probably early next week (after next round of bug fixes and testing). Charlie.
Vijay call. Charlie will talk to him about specifying input and output files sometime next week. Probably using a conf file that describes what to look for and what to generate.
New molecules, JoshM will get the ones in ~pande and set them up.
SC04
John is looking into travel.
Take the projector, screen, CS department sab rep ads.
Numerical Methods
Record testing results in folding-at-clusters/testing/umm.html. This same directory should have any scripts, documentation, etc. related to testing as well. John.
Check-out a particular tagged release of F@C and build from source to do the performance analysis (rather than using GROMACS directly as before). John.
Generated 1/sqrt(x) C code for PeterB. Charlie.
Merk
Reveiw B-and-T GROMACS poster, JoshH will re-print in Bloomington and bring to Richmond.
Use F@C materials from SC04.
Plumbing
ACS Funding
F@C
Vijay is having lunch with Adam Begerg (now a grad school student in CS at Stanford), he's going to see if Adam has time/interest on working on COSM, etc.
MP-Lite would greatly simplify the installation of F@C. We'll need to look at FFTW/MPI and see if we can build GROMACS with FFTW libraries that don't have MPI calls (it's also possible to not use FFTW, how much science would that exclude?), stdout/stderr mapping from mdrun/child to mother via LAM's filehandle mapping, and possibly other areas. See JoshH's MT entry for a start on this.
Today's meeting was cancelled due to the recent release (a1). We all have well defined tasks to work on so a meeting isn't necessary.
Charlie has the paper notes from this meeting.