Reading HPC.
Worked on one of the quarkprime machines installing FreeBSD.
Athena image ready and problem with system imager still a mystery. When trying to make a bootable diskette I get the choice of flavors between ATHENA and standard so i am thinking it has to be in the boot loader files since that was what came up last time.Will try to work with skylar to resolve issue and athena cluster should be ready.
Working on/Worked on
To Do list
Ava Willis and I have been researching various grant opportunities, we'll need to prepare one or more of these this summer in light of the HHMI rejection. I have a folder with the Foundation Directory entries for each of these if anyone is interested (J^2 mostly I suspect).
o Sun Microsystems
There are two catagories that I'm thinking about:
Higher Education - Scientific and Engineering Computing
Primary and Secondary - University Outreach to K-12 Schools
For the first one we would ask for a hardware donation of Sun SPARC workstations to use for our work in molecular dynamics. For the second one we would ask for money to support a project which Ray Ontko (EC '84) have been noodling around for a while which involves giving presentations to area high schools with a couple of EC CS students and faculty.
o Research Corporation
I'm not sure about this one but I'm leaning towards no. They seem to be most interested in Physics, Astronomy, and Chemistry. Even though the basis of our work is Chemistry and Physics I suspect to these folks it will look like CS. I suspect the lack of a PhD would be an issue as well.
o Keck Foundation
I don't remember the details of our last effort (Fall, 2000) with Keck but it may be time to re-visit them. PaulO and I have talked about a particular project involving chemistry and CS which I think would be very attractive to them. I'm not sure who you should speak with to get the background here if you don't already know it. If you think we're ready to approach them again then the three of us (Ava, Charlie, and Paul) should set-up a time to meet sometime in the next couple of weeks.
o Intel
I think this is worth considering. It looks farily simple and there is no deadline. A proposal used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.
o Arnold and Mabel Beckman Foundation
I think this is worth considering. They require a pre-proposal letter which I can begin to craft. The deadline is October 1. A pre-proposal letter used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.
o American Honda Foundation
I think this is worth considering. They require either an initial phone call or letter. The next deadline is August 1. A proposal used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.
o Eastman Kodak Company Contributions Program
I don't think this is worth our time. While there is some fit between our work and their giving areas it's not that great. The deadline was April 30.
o Apple
I don't think this is worth considering. The hardware they are offering we don't really need (we have a bunch of it already) and they are looking for people that are further up the chain than we are.
Worked on a new athena image with JoshM, run into problems with nis, will work on it more together.
Made floppies and did a network install of FreeBSD on one of the quarkprime machines.
Learning useful command tools from man pages. Reading HPC.
Worked on
To Do list
CCG Meeting - cp, jh, jm, db
Bi-weekly updates (so far) are working well. Remember to make these entries each Sunday and Wednesday evenings. Use the General category with a title of "Update - Name - Date". All of us should read these updates before our group meetings on Mon and Thu.
WordPress - All reviews look good. Added install/configure to the plumbing list.
The new system/network/database administration list for this summer is now available at /cluster/project/sna/plumbing-summer-2004.html. All of us should use log.html (in that same directory) to note things we fix/install/change on the clusters.
The first cut of the summer reading list is available at /cluster/project/generic/doc/readings-summer-2004.html. Check it out and read!
The first cut of the summer plan is available at /cluster/project/generic/doc/plan-summer-2004.html. Check it out and get your feedback to Charlie.
Everyone except Charlie updated the software overview chunks. Scheduler as it is currently will be nuked and new documentation created when the new scheduler is built.
Plumbing - After a bunch of futzing the new drive is in hopper. JoshM and Dawit are still working on the athena cluster. JoshH is trying to build a SystemImager disk for cairo.
Support for threads (-nt) in GROMACS - No news from JoshH yet. If anything it may work without MPI enabled.
Communication document - Charlie is working on a document describing guidelines for using MT, email, WiKi, etc.
Displayer - JoshM has some good ideas for how to have preset queries, graphing, etc. We drew a picture of a layout and discussed data model changes. Make a new form to display, edit, insert, etc. preset queries. Open enhancements and bugs: order of data points, consistant color/molecule mapping, grid lines on the graph option. Currently it uses the same predicate (with a different select list and order by) for the tabular data and the graph. Long description shows-up as a legend on the graph. Short description is used as the title, long as the legend.
Conference and paper tour - No word from LinuxFest yet.
Future Meeting
Conference and paper tour
Scheduler presentation and conversation, read JoshH's entry in preparation
FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
Modular F@C - make a plan
Worked On:
To Do List:
CCG Meeting - cp, jh, jm, db
Software overview - This needs to be updated so that it can serve as a resource for the new folks. JoshH - LAM-MPI, CVS. JoshM - FFTW, GROMACS. Charlie - C3, PVM.
Plumbing - The new drive, cable, adapter, etc. for /cluster are in. Charlie will work on installing it Friday morning. Dawit is working on building a new image for athena, ultimately athena will be our sand-box. JoshM will be working with him on this in preparation for the new images that need to be built for bazaar and cairo in the next couple of weeks.
Support for threads (-nt) in GROMACS - No news from JoshH yet. If anything it may work without MPI enabled.
WordPress rather than MT? - WordPress seems to get good reviews as an open source alternative to MT. JoshM and JoshH will check it out. If we are going to change we should do it soon while the body of stuff that needs to be migrated from MT is still modest in volume.
Bi-Weekly MT updates - Everyone should start (this week!) making bi-weekly entries in MT (by Sun at midnight and Wed at midnight). These would be summary of what you are working on, progress made, and challanges to be overcome. Use the General category with a title of "Update - Name - Date". All of us should read these updates before our group meetings on Mon and Thu.
Communication document - Charlie is working on a document describing guidelines for using MT, email, WiKi, etc.
Conference and paper tour - JoshH submitted our entry for LinuxFest 2004, no reply yet. We need to make a decision about where to submit the B and T GROMACS paper. Other deadlines are coming-up soon, more conversation next week.
Displayer - We discussed the functionality and organization of the tool JoshM has built. He has fixed most all of the outstanding functionality problems and it's looking pretty good. JoshM will prototype a couple of different ways to address stored queries/descriptions and dynamic column display for Monday's meeting. Open enhancements and bugs: order of data points, using the same SQL to generate tables and graphs, consistant color/molecule mapping, and grid lines on graph option.
Logistics - Work on-campus on the West end of the 2nd floor during the day on Mondays and Thursdays. Charlie will move back into his office sometime during early June.
Reading list and the plan - Charlie is still working on these. Dawit has the first two AltiVec readings and is waiting for more. Charlie hasn't gotten back to John yet. JoshM is going to re-read Leach before handing it off.
Next meeting
FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
Scheduler presentation and conversation, read JoshH's entry in preparation
Modular F@C - make a plan
here is an outline of how the current scheduler works. First notw that there are two versions of the scheduler [I have forgotten why exactly] there is detailed-scheduler.pl which is the current version and the one that should be used, and there is scheduler.pl which is old and should not be used. The latter does not have the 'find the dominate inner loop' code.
Some General Notes:
CCG Meeting - cp, jh, jm, db
Meeting times - We'll be gathering for regular meetings at 1p on Mondays and Thursdays. During most of the weeks that Charlie is out of town we'll need to move those to 7p.
Working on campus - All of us are going to make it a point to work in D129 or nearby all day on Monday and Thursdays. The rest of the week people are free to work wearever they feel most comfortable.
Modular F@C - Charlie will contact Vijay WRT which molecular dynamics packages we should be considering. Once we have a sense of the breadth we can examine each in turn to see if our overall model will work and if not what we can do to accomodate particular packages.
Graphing tool - JoshM is making some progress, between now and Thursday he is going to finish the remaining details. On Thursday we'll step back and take a high level view of this tool and see how best to package the functionality. This includes a review of the data model (succ || fail, how fail, etc.)
B and T GROMACS paper - We need to decide soon which journal(s) we are going to aim for and layout the plan.
GROMACS and MPI research - On-hold until we have a better sense of where F@C is going WRT treating MD software packages as opaque blocks.
Threads and GROMACS - JoshH reports that this may be wild-west ware. He documented known problems between threads and LAM-MPI so this may be a large hurdle to get over. He is going to post a message to gmx-developers to see what's up. If integrating threads and MPI is not possible than we may want to a) evaluate threads without MPI on a single dual CPU node and b) start our consideration of other (thread aware/compatible) message passing libraries.
/cluster space - The new drive is here, on Thursday we'll look at pausing the current batch of tests and installing it. We should check it in b16 first to make sure it made the trip ok.
Reading - Charlie gave two AltiVec articles to Dawit. JoshM has Leach. Charlie is going to keep working on a first cut of the reading list, stay tuned to /cluster/project/generic/doc.
Plumbing - JoshM is going to start working on SystemImager with Dawit. We need to build new cairo, bazaar, c0, and b0 images before the cluster move in June.
Items for our next meeting: Conference calendar Review graphing/test result tool - packaging and data model Modular F@C Weekly email updates
I submitted the following to the Ohio Linux Fest 2004:
http://cluster.earlham.edu/detail/project/b-and-t-gromacs/presentations/linux-fest-2004.html
I may try to convert this to actual HTML in the near future, but I may wait until we start pounding out some prose/presentation materials for it first, so we can define its structure.
CCG Meeting - cp, jh, jm
(Some of this is from Tuesday's meeting.)
Apple Grant - We're going to apply to this www.apple.com/science/clusteraward program. Apple is going to donate some 5 node G5 Bioinformatics clusters. Erik Lindahl is on the panel. Probably propose a modular F@C. Charlie will work on this during late May (the deadline is June 13). All of us need to read Apple's fine print and make sure we will be comfortable with their terms.
Molecule Errors - Proteasome is failing about 2/3 of the time between 1 and 34 processes on Cairo. Charlie is tracking the various failure modes and putting together a message to Erik and Vijay. Bazaar annex runs for this molecule will start sometime after the power outage on Saturday.
Molecule Organization - JoshH moved copies of all the source files for the molecules we are currently using to /cluster/project/molecules. Charlie will find descriptive text, results of 1-34 process runs, and any notes and organize them with the molecule source files.
For the next batch of simulations (on Cairo) we'll run them for 2 picoseconds rather than 10 picoseconds as previously. While this is less than Vijay and Michael thought was needed (5 picoseconds) for accurate prediction of overall simulation rate, to-date 2 picoseconds has yielded reasonably accurate predictions for us. Shortening the duration of the runs greatly increases the breadth and depth of testing we can do. JoshH will be starting this next batch after the power outage.
Charlie is keeping a list of items that need to be addressed when we rebuild the Cairo and bazaar images this summer.
Graphing Tool - It's looking better but there are still a couple of key items needed. Graphing data sets with an unequal number of elements and a couple of other items from JoshH's message. Set queries: n-n-n, 2n-n-n, 4n-2n-n, 6n-2n-n, 8n-2n-n; cairo, bazaar, checkboxes for each molecule. cgi-bin should be in CVS. JoshM will work on this.
mdrun Option -nt (nubile Tanzainians) - Does this work with MPI? Is it a more or less effecient way to map execution contexts to processors than MPI's processes? JoshH will look into this.
LinuxFest Submission - JoshH sent us a first cut, JoshM and CharlieP need to review and write bios. This must be done by Friday so that JoshH and submit it Saturday.
Charlie is ordering a new drive to use as /cluster on hopper.
Next meeting: Monday May 17, 2004 @ 12:00EST
Per our conversation on Monday I changed how our domain points so it is more intutive. So now http://cluster.earlham.edu/ will point to the html pages that we have created, and to reach the details click on the Browse Files link. The old link to http://cluster.earlham.edu/html still works just as it has before.
In the process I had to move /cluster/icons to /cluster/html/icons and they are not in CVS. Should they be?
I also noticed that cvsweb was not working so I fixed that as well.
I fixed the links in the HEADER file as well.
So I have been investigating why I am seeing some weird behaviour in the F@C framework when using POSIX signals. I found a couple of bits:
Signal Cataching Changes in 6.5.9 Release
Signal catchingLAM MPI now catches the signals SEGV, BUS, FPE, and ILL. The signal handler terminates the application. This is useful in batch jobs to help ensure that mpirun returns if an application process dies. To disable the catching of signals use the -nsigs option to mpirun.
Internal signal
The signal used internally by LAM has been changed from SIGUSR1 to SIGUSR2 to reduce the chance of conflicts with the Linux pthreads library. The signal used is configurable. See the installation guide for the specific ./configure flag that can be used to change the internal signal.
2.9.2. Interaction with Signals
MPI does not specify the interaction of processes with signals and does not require that MPI be signal safe. The implementation may reserve some signals for its own use. It is required that the implementation document which signals it uses, and it is strongly recommended that it not use SIGALRM, SIGFPE, or SIGIO. Implementations may also prohibit the use of MPI calls from within signal handlers.In multithreaded environments, users can avoid conflicts between signals and the MPI library by catching signals only on threads that do not execute MPI calls. High quality single-threaded implementations will be signal safe: an MPI call suspended by a signal will resume and complete normally after the signal is handled.
In short if we use LAM-MPI then we should stay away from the following signals:
SEGV,BUS,FPE,ILL,TERM,USR2
I have changed the code from using SIGUSR2 to using SIGCHLD (a signal that is currently ignored by default according to signal(7) manpage), and things are working much better.
The goal of this post is to start the conversation about MD modules in F@C, and the requirements that new modules must adhere to in order to be classified as a potential module.
In order for GROMACS (3.1.x and 3.2.x) to build and use AltiVec instructions on PowerPC chips running Yellow Dog Linux/gcc 3.3.2 there are two files in the distribution which need a header file added to them.
In configure "#include <altivec.h>" should be added before main() in the generated C code in the AltiVec support test section. You can find this by searching for "supports altivec".
In include/ppc_altivec.h "#include <altivec.h>" should be added before the first function definition.
So here is a short tutorial on how I currently do a b-and-t-g Database reconsile with the disk. First be farmiliar with two items:
SELECT molecule,label,processes,cpus,nodes,finish_time from result where cluster_name = 'cairo' and molecule = 'villin' ORDER BY molecule, nodes, cpus, processes
MOLECULE-LABEL-PROCESSESonCPUS-NODES
villin-Gromacs-Optimal-3.2.0-37on20-10
You are looking in the finish_time field for the rows that you can clean. You are looking for rows that have a null finish_time or a finish_time of 1900-01-01 00:00:00 the latter indicates that the scheduler found an error and reported it, the former means that the run is stillin progress or never finished. As you are looking through the database you need to know the runs that are currently running because they will have nothing in their finish_time field, and if you delete this row then the scheduler will fail. If the run never finished then it can be removed from the database.
When you delete a row from the database delete the assocated directory in /cluster/project/b-and-t-g/results/CLUSTER/raw-data/
Once you weed out all of the runs with a finish_time of null or 1900-01-01 00:00:00 then you are finished.
So here are some of my notes from imaging the Cairo nodes that were running OSX.
Here is a link to the MPI Form's section on the error handler:
Comm unicator Error Handler
We need to create a function that is using the type:
typedef void MPI_Comm_errhandler_fn(MPI_Comm *, int *, ...);
int MPI_Comm_create_errhandler(MPI_Comm_errhandler_fn *function, MPI_Errhandler *errhandler)
I have manually compiiled a list of the molecule runs that [have | have not | will not be] completed for both bazaar ad cairo.
Bazaar Cluster
Cairo Cluster
These are automatically updated from the database when you refresh the page. The key has changed a bit from previous iterations of this chart. I am working on a Time Approximation scheme to place on the page as well.
The pages list the runs in 4 catagories:
May 3, 2004 CCG Meeting - cp, jh, jm
Charlie still working on reading lists for 1/x, sqrt(x), and F@C.
Graphing tool. JoshM making some progress but when the same script is run under /cluster/cgi-bin it fails but works fine when in ~mccoyjo/cgi-bin it's fine. JoshH will take a look with him. If by Tuesday there is no solution send email to Charlie so he can look at it too. Presentation on Thursday to Vijay's group.
Specifications for the new graphing tool. X axis parallel structure, Y axis ps real/day, one color per molecule, data point at each parallel architecture. One graph per parallel structure: one process per cpu, two processes per cpu, one process per node, etc. JoshH will flesh this out.
Molecule runs are moving along. Bazaar still chugging, cairo idle now. JoshH will bring the chart up-to-date and split it into bazaar and cairo. JoshH will talk to Seth about recovering the cairo nodes.
mdrun/MPI, JoshM is making some progress. He will write-up a description and post it to MT.
This is a good time for Charlie to get questions answered by Vijay, et. al. Send them along post haste.
Villin/Urea molecule failure. Some progress, no real news yet. Charlie.
Implement catch the child failure, recover, and re-start with one less process logic and code in v0.5 of F-at-C. Making progress, figured-out how to keep the group from dying (MPI flag). Typedef'd functions and error handler, Charlie will look at this. JoshH.
Grant opportunity, Sun for hardware and money to port GROMACS to SPARC. Charlie.