May 31, 2004

Update

Reading HPC.
Worked on one of the quarkprime machines installing FreeBSD.
Athena image ready and problem with system imager still a mystery. When trying to make a bootable diskette I get the choice of flavors between ATHENA and standard so i am thinking it has to be in the boot loader files since that was what came up last time.Will try to work with skylar to resolve issue and athena cluster should be ready.

Posted by bekelda at 11:23 PM | Comments (67)

May 30, 2004

Update - Josh H - May 30, 2004

Working on/Worked on


  • Readings: Altivec Documents
  • Fixed /cluster mounting issue on admin, commenting out the mount in rc.local
  • Installed a CD-R drive in admin and created a i386 boot cd. Also installed cdrecord. see sna log for more details.
  • Figured out that systemimager does not currently support creating a boot disk for PPC. So we must use the yellowdog install, systemimager client install, update method until someone figures out how to make a bootable CD for the PPC. I put the instructions in the sna Systemimager HowTo document.
  • Found BootCD which is able to make a bootable cd for Mac OS X. Maybe we can widdle it down to something that we can use with systemimager.
  • Fixed the permissions for CVS direcctories by making them all group writible.
  • Porting GROMACS to:

    • PVM: Going well. I am still trying to figure out how to run the verison from the website on cairo and bazaar. This is what crashed cairo this week. I am making progress.
    • MPICH: Installed MPICH on cairo and will install the beta release of MPICH2 on cairo which implements the MPI-2 standard, or hopes to.
    • MP-Lite: should be installed on cairo, but I am focusing on the PVM and MPICH implementations before I get to this.

    All implementations will be posted to the Earlham Cluster Computing site when they are ready, and then contributed to the GROMACS site. I want to develop them for Cairo since I will have to run many tests and it is the fastest cluster we have. This way when we are ready to test, this will be a relitively quick process.

To Do list


  1. Fix Arther Vining Davis logo in Poster
  2. B-and-T-GROMACS paper
  3. Port GROMACS to:

    • MPICH
    • PVM
    • MP-Lite

  4. Modify DB schema and schedulerto contain [success|fail], Where did it fail, how did it fail.

Posted by hursejo at 11:31 PM | Comments (65)

May 29, 2004

Grant Opportunities

Ava Willis and I have been researching various grant opportunities, we'll need to prepare one or more of these this summer in light of the HHMI rejection. I have a folder with the Foundation Directory entries for each of these if anyone is interested (J^2 mostly I suspect).

o Sun Microsystems

There are two catagories that I'm thinking about:

Higher Education - Scientific and Engineering Computing
Primary and Secondary - University Outreach to K-12 Schools

For the first one we would ask for a hardware donation of Sun SPARC workstations to use for our work in molecular dynamics. For the second one we would ask for money to support a project which Ray Ontko (EC '84) have been noodling around for a while which involves giving presentations to area high schools with a couple of EC CS students and faculty.

o Research Corporation

I'm not sure about this one but I'm leaning towards no. They seem to be most interested in Physics, Astronomy, and Chemistry. Even though the basis of our work is Chemistry and Physics I suspect to these folks it will look like CS. I suspect the lack of a PhD would be an issue as well.

o Keck Foundation

I don't remember the details of our last effort (Fall, 2000) with Keck but it may be time to re-visit them. PaulO and I have talked about a particular project involving chemistry and CS which I think would be very attractive to them. I'm not sure who you should speak with to get the background here if you don't already know it. If you think we're ready to approach them again then the three of us (Ava, Charlie, and Paul) should set-up a time to meet sometime in the next couple of weeks.

o Intel

I think this is worth considering. It looks farily simple and there is no deadline. A proposal used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.

o Arnold and Mabel Beckman Foundation

I think this is worth considering. They require a pre-proposal letter which I can begin to craft. The deadline is October 1. A pre-proposal letter used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.

o American Honda Foundation

I think this is worth considering. They require either an initial phone call or letter. The next deadline is August 1. A proposal used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.

o Eastman Kodak Company Contributions Program

I don't think this is worth our time. While there is some fit between our work and their giving areas it's not that great. The deadline was April 30.

o Apple

I don't think this is worth considering. The hardware they are offering we don't really need (we have a bunch of it already) and they are looking for people that are further up the chain than we are.

Posted by charliep at 12:35 PM | Comments (14)

May 27, 2004

update

Worked on a new athena image with JoshM, run into problems with nis, will work on it more together.
Made floppies and did a network install of FreeBSD on one of the quarkprime machines.
Learning useful command tools from man pages. Reading HPC.

Posted by bekelda at 12:48 AM | Comments (0)

May 26, 2004

Update - Josh H - May 26, 2004

Worked on


  • Spent a load of time working on bring up c1. Still have not figured out how to make a CD that is bootable on cairo for systemimager. I want to try installing the CD-R/CD-RW in admin and burning the disk there, since I read a post or two that any disk burnt under OS X will not work as a boot able disk for YellowDog. I believe this to be superstition, but I am about to that point in the process.
    All said and done cairo is back up and running properly.
  • Fixed the double mount of /cluster on cairo which made using cairo nearly useless. This involved commenting out a line in /etc/rc.local. it is working now after a reboot.
  • I rebuilt b19, and after a DHCP shuffle on hopper it is running again with bazaar annex.
  • Sent a chaser e-mail to LinuxFest to confirm our submission. No word back yet.
  • Nosed around in charlie's environment on Bazaar, and switched lamhalt to wipe in his run.pl script [renamed the new version to joshh-run.pl], and was able to run the tests just fine on bazaar. hopefully this will fix the problem.
  • Cairo reinstall Note: if you want the ethernet to work after a fresh yellowdog install on cairo, hard code the IP/GATEWAY/Primary DNS and disable eth0 on startup. All within the graphical setup.
  • Sent a message to the GROMACS developers list regarding thread support with and without MPI.
  • Looked over Summer Plan

To Do list


  1. Fix Arther Vining Davis logo in Poster
  2. Readings
  3. B-and-T-GROMACS paper
  4. Port GROMACS to:

    • MPICH
    • PVM
    • MP-Lite

  5. Modify DB schema and schedulerto contain [success|fail], Where did it fail, how did it fail.

Posted by hursejo at 08:00 PM | Comments (0)

May 25, 2004

Meeting Notes - May 24, 2004

CCG Meeting - cp, jh, jm, db

Bi-weekly updates (so far) are working well. Remember to make these entries each Sunday and Wednesday evenings. Use the General category with a title of "Update - Name - Date". All of us should read these updates before our group meetings on Mon and Thu.

WordPress - All reviews look good. Added install/configure to the plumbing list.

The new system/network/database administration list for this summer is now available at /cluster/project/sna/plumbing-summer-2004.html. All of us should use log.html (in that same directory) to note things we fix/install/change on the clusters.

The first cut of the summer reading list is available at /cluster/project/generic/doc/readings-summer-2004.html. Check it out and read!

The first cut of the summer plan is available at /cluster/project/generic/doc/plan-summer-2004.html. Check it out and get your feedback to Charlie.

Everyone except Charlie updated the software overview chunks. Scheduler as it is currently will be nuked and new documentation created when the new scheduler is built.

Plumbing - After a bunch of futzing the new drive is in hopper. JoshM and Dawit are still working on the athena cluster. JoshH is trying to build a SystemImager disk for cairo.

Support for threads (-nt) in GROMACS - No news from JoshH yet. If anything it may work without MPI enabled.

Communication document - Charlie is working on a document describing guidelines for using MT, email, WiKi, etc.

Displayer - JoshM has some good ideas for how to have preset queries, graphing, etc. We drew a picture of a layout and discussed data model changes. Make a new form to display, edit, insert, etc. preset queries. Open enhancements and bugs: order of data points, consistant color/molecule mapping, grid lines on the graph option. Currently it uses the same predicate (with a different select list and order by) for the tabular data and the graph. Long description shows-up as a legend on the graph. Short description is used as the title, long as the legend.

Conference and paper tour - No word from LinuxFest yet.

Future Meeting
Conference and paper tour
Scheduler presentation and conversation, read JoshH's entry in preparation
FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
Modular F@C - make a plan

Posted by charliep at 02:30 AM | Comments (0)

May 24, 2004

update Josh McCoy 23 May 2004

  • The GROMACS and FFTW software overview pages have been updated. FFTW info changes consisted of updating directories. The GROMACS info seemed to be fine save the info about the included scheduler documentation. Maybe the scheduler should have its own software overview page?

  • Wordpress seems very nice. It seems to have a large set of useful features. It is php based (which is good) and mysql based (which I'm not so crazy about). From looking at the forum, faq, and screenshots, wordpress seems to have a good method of have multiple categories per post and support for subcategories.

  • Work on the graphing utility mockups is going nicely. I've found methods of accomplishing most of what each method would entail. A point of contention that exists is to either augment our existing tables with the preset query data or to add a new table to hold the information.

  • A quick tour through the HTML 4.01 specifications yielded the code for using a combo box to choose the fields to be used in the select statement of our sql queries.

  • Reading Leach is still as exciting as ever. I'll read a bit more of it tomorrow morning and hand it off if some one else wants to peruse its contents.

Posted by mccoyjo at 12:31 AM | Comments (0)

May 23, 2004

Update - Josh H - May 23, 2004

Worked On:


  • After having it bug me for a while I put "finger to keyboard" and wrote v0.1 of the Benchmarking and Tuning GROMACS paper. You all should read over it and we should talk about it. We are close, but there are a few bits that need cleaning up, and some graphs that we will need to make. Here is the link (I will put it in CVS if you want, but currently it is just a txt file) B-and-T-Gromacs-Draft.txt
  • Where to send paper for publication:

  • WordPress Evaluation:

    • The sub catagories fature may be useful. MT currently cannot do this.
    • No rebuilding necessary. PHP based, not perl so instant posting with no need to hit the rebuild button. Actually minor point.
    • Able to import entries from MT to WordPress! Very nice!
    • Text formating seems to be a bit mroe advanced.
    • Has RSS Feed capability!
    • Looks like they are thinking about supporting a spell checker, MT has a plug in but it does not look like it is that good.
    • looks like only mysql is supported, and the database is necessary. in MT it is optional. I'm sure we could port it, but I think mySql is installed on hopper already.
    • Conclusion: I say we download a copy and try it out. Not for production, but to see if it works as well if not better than MT. the site is fairly light on the features and how it works, so this might be the best way to learn.

  • Fixed a permission problem with cvsroot. it was not group writable so many features were not working.
  • Software Overview

To Do List:


  1. Threads investigation. Does it work? Mail the GROMACS List.
  2. Send mail to MCURCSM 2004 if I want to try to go. I think I might pass, we can talk about it on Monday.
  3. Port GROMACS to:

    • MPICH
    • MP_LITE
    • PVM

  4. Modify DB schema and schedulerto contain [success|fail], Where did it fail, how did it fail.

Posted by hursejo at 05:20 PM | Comments (0)

May 21, 2004

Meeting Notes - May 20, 2004

CCG Meeting - cp, jh, jm, db

Software overview - This needs to be updated so that it can serve as a resource for the new folks. JoshH - LAM-MPI, CVS. JoshM - FFTW, GROMACS. Charlie - C3, PVM.

Plumbing - The new drive, cable, adapter, etc. for /cluster are in. Charlie will work on installing it Friday morning. Dawit is working on building a new image for athena, ultimately athena will be our sand-box. JoshM will be working with him on this in preparation for the new images that need to be built for bazaar and cairo in the next couple of weeks.

Support for threads (-nt) in GROMACS - No news from JoshH yet. If anything it may work without MPI enabled.

WordPress rather than MT? - WordPress seems to get good reviews as an open source alternative to MT. JoshM and JoshH will check it out. If we are going to change we should do it soon while the body of stuff that needs to be migrated from MT is still modest in volume.

Bi-Weekly MT updates - Everyone should start (this week!) making bi-weekly entries in MT (by Sun at midnight and Wed at midnight). These would be summary of what you are working on, progress made, and challanges to be overcome. Use the General category with a title of "Update - Name - Date". All of us should read these updates before our group meetings on Mon and Thu.

Communication document - Charlie is working on a document describing guidelines for using MT, email, WiKi, etc.

Conference and paper tour - JoshH submitted our entry for LinuxFest 2004, no reply yet. We need to make a decision about where to submit the B and T GROMACS paper. Other deadlines are coming-up soon, more conversation next week.

Displayer - We discussed the functionality and organization of the tool JoshM has built. He has fixed most all of the outstanding functionality problems and it's looking pretty good. JoshM will prototype a couple of different ways to address stored queries/descriptions and dynamic column display for Monday's meeting. Open enhancements and bugs: order of data points, using the same SQL to generate tables and graphs, consistant color/molecule mapping, and grid lines on graph option.

Logistics - Work on-campus on the West end of the 2nd floor during the day on Mondays and Thursdays. Charlie will move back into his office sometime during early June.

Reading list and the plan - Charlie is still working on these. Dawit has the first two AltiVec readings and is waiting for more. Charlie hasn't gotten back to John yet. JoshM is going to re-read Leach before handing it off.

Next meeting
FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
Scheduler presentation and conversation, read JoshH's entry in preparation
Modular F@C - make a plan

Posted by charliep at 11:23 AM | Comments (0)

May 19, 2004

Scheduler outline

here is an outline of how the current scheduler works. First notw that there are two versions of the scheduler [I have forgotten why exactly] there is detailed-scheduler.pl which is the current version and the one that should be used, and there is scheduler.pl which is old and should not be used. The latter does not have the 'find the dominate inner loop' code.

  1. Grab Arguments. Most of which are files [these should always be the last set of arguments given to the program] which are placed into an array. Before the files there are some specalized flag that turn on things like switch monitoring and /proc changes
  2. for Each File
    1. If the stopping flag has been set by the singal handler [SIGUSR1 or SIGUSR2 send to the head process]. then post a mail message on how to restart and the current state. then exit.
    2. Initalize Tests
      1. Parse Config File. Here we also make the working directory and make sure we have unique path and tag names. If this is a 'duplicate' test then attach -Run-# to the end of the tag and create the directory.
      2. Make the node list. This depends upon the cluster we are running on [see notes in program] and whether we are runnig the tests as node or cpu cyclic.
      3. Set Environment Variables.
      4. Prepare result and option_profile rows in the Database
      5. Generate Run script using node list.
    3. Launch the script via nohup so we can do...
    4. Checkpointing. Wait for finish_time field to obtain a value. If the value was 1900-01-01 then post an error and quit the scheduler.
    5. Analyse Run [mark ps_real, ps_node, dominate inner loop, etc.]
    6. Cleanup variables for next run. Mail successful completion of this configuration file.
  3. Mail a Scheduler Finished message

Some General Notes:


  • I use the 'usysv' ssi flag to mpirun by default because it provides the best all around performance.
  • Therea re some heavy duty perl Regular Expressions in the analyze routines, especaly whe finding the Dominte Inner loop. If these get too much to parse use the commented out print statements in the control statements to help.
  • The general format for a directory name is: [molecule]-[tag]-[processes]on[cpus]-[nodes]

Posted by hursejo at 03:40 PM | Comments (0)

May 17, 2004

Meeting Notes - May 17, 2004

CCG Meeting - cp, jh, jm, db

Meeting times - We'll be gathering for regular meetings at 1p on Mondays and Thursdays. During most of the weeks that Charlie is out of town we'll need to move those to 7p.

Working on campus - All of us are going to make it a point to work in D129 or nearby all day on Monday and Thursdays. The rest of the week people are free to work wearever they feel most comfortable.

Modular F@C - Charlie will contact Vijay WRT which molecular dynamics packages we should be considering. Once we have a sense of the breadth we can examine each in turn to see if our overall model will work and if not what we can do to accomodate particular packages.

Graphing tool - JoshM is making some progress, between now and Thursday he is going to finish the remaining details. On Thursday we'll step back and take a high level view of this tool and see how best to package the functionality. This includes a review of the data model (succ || fail, how fail, etc.)

B and T GROMACS paper - We need to decide soon which journal(s) we are going to aim for and layout the plan.

GROMACS and MPI research - On-hold until we have a better sense of where F@C is going WRT treating MD software packages as opaque blocks.

Threads and GROMACS - JoshH reports that this may be wild-west ware. He documented known problems between threads and LAM-MPI so this may be a large hurdle to get over. He is going to post a message to gmx-developers to see what's up. If integrating threads and MPI is not possible than we may want to a) evaluate threads without MPI on a single dual CPU node and b) start our consideration of other (thread aware/compatible) message passing libraries.

/cluster space - The new drive is here, on Thursday we'll look at pausing the current batch of tests and installing it. We should check it in b16 first to make sure it made the trip ok.

Reading - Charlie gave two AltiVec articles to Dawit. JoshM has Leach. Charlie is going to keep working on a first cut of the reading list, stay tuned to /cluster/project/generic/doc.

Plumbing - JoshM is going to start working on SystemImager with Dawit. We need to build new cairo, bazaar, c0, and b0 images before the cluster move in June.

Items for our next meeting:
  Conference calendar
  Review graphing/test result tool - packaging and data model
  Modular F@C
  Weekly email updates
Posted by charliep at 04:54 PM | Comments (1)

May 14, 2004

Ohio Linux Fest 2004 Submission

I submitted the following to the Ohio Linux Fest 2004:
http://cluster.earlham.edu/detail/project/b-and-t-gromacs/presentations/linux-fest-2004.html

I may try to convert this to actual HTML in the near future, but I may wait until we start pounding out some prose/presentation materials for it first, so we can define its structure.

Posted by hursejo at 07:49 PM | Comments (0)

May 13, 2004

Meeting Notes - May 13, 2004

CCG Meeting - cp, jh, jm

(Some of this is from Tuesday's meeting.)

Apple Grant - We're going to apply to this www.apple.com/science/clusteraward program. Apple is going to donate some 5 node G5 Bioinformatics clusters. Erik Lindahl is on the panel. Probably propose a modular F@C. Charlie will work on this during late May (the deadline is June 13). All of us need to read Apple's fine print and make sure we will be comfortable with their terms.

Molecule Errors - Proteasome is failing about 2/3 of the time between 1 and 34 processes on Cairo. Charlie is tracking the various failure modes and putting together a message to Erik and Vijay. Bazaar annex runs for this molecule will start sometime after the power outage on Saturday.

Molecule Organization - JoshH moved copies of all the source files for the molecules we are currently using to /cluster/project/molecules. Charlie will find descriptive text, results of 1-34 process runs, and any notes and organize them with the molecule source files.

For the next batch of simulations (on Cairo) we'll run them for 2 picoseconds rather than 10 picoseconds as previously. While this is less than Vijay and Michael thought was needed (5 picoseconds) for accurate prediction of overall simulation rate, to-date 2 picoseconds has yielded reasonably accurate predictions for us. Shortening the duration of the runs greatly increases the breadth and depth of testing we can do. JoshH will be starting this next batch after the power outage.

Charlie is keeping a list of items that need to be addressed when we rebuild the Cairo and bazaar images this summer.

Graphing Tool - It's looking better but there are still a couple of key items needed. Graphing data sets with an unequal number of elements and a couple of other items from JoshH's message. Set queries: n-n-n, 2n-n-n, 4n-2n-n, 6n-2n-n, 8n-2n-n; cairo, bazaar, checkboxes for each molecule. cgi-bin should be in CVS. JoshM will work on this.

mdrun Option -nt (nubile Tanzainians) - Does this work with MPI? Is it a more or less effecient way to map execution contexts to processors than MPI's processes? JoshH will look into this.

LinuxFest Submission - JoshH sent us a first cut, JoshM and CharlieP need to review and write bios. This must be done by Friday so that JoshH and submit it Saturday.

Charlie is ordering a new drive to use as /cluster on hopper.

Next meeting: Monday May 17, 2004 @ 12:00EST


  • Code review
  • Modular MD approach
  • Test Result Tool
  • Where to publish B and T GROMACS paper?
  • Conference calendar checkin
  • Report from Charlie from Pande Lab visits

Posted by charliep at 07:19 PM | Comments (0)

May 12, 2004

Swap domain pointing

Per our conversation on Monday I changed how our domain points so it is more intutive. So now http://cluster.earlham.edu/ will point to the html pages that we have created, and to reach the details click on the Browse Files link. The old link to http://cluster.earlham.edu/html still works just as it has before.
In the process I had to move /cluster/icons to /cluster/html/icons and they are not in CVS. Should they be?
I also noticed that cvsweb was not working so I fixed that as well.
I fixed the links in the HEADER file as well.

Posted by hursejo at 05:33 PM | Comments (0)

LAM MPI and Signals

So I have been investigating why I am seeing some weird behaviour in the F@C framework when using POSIX signals. I found a couple of bits:
Signal Cataching Changes in 6.5.9 Release


Signal catching

LAM MPI now catches the signals SEGV, BUS, FPE, and ILL. The signal handler terminates the application. This is useful in batch jobs to help ensure that mpirun returns if an application process dies. To disable the catching of signals use the -nsigs option to mpirun.

Internal signal

The signal used internally by LAM has been changed from SIGUSR1 to SIGUSR2 to reduce the chance of conflicts with the Linux pthreads library. The signal used is configurable. See the installation guide for the specific ./configure flag that can be used to change the internal signal.


and this bit from mpi-forum.org

2.9.2. Interaction with Signals
MPI does not specify the interaction of processes with signals and does not require that MPI be signal safe. The implementation may reserve some signals for its own use. It is required that the implementation document which signals it uses, and it is strongly recommended that it not use SIGALRM, SIGFPE, or SIGIO. Implementations may also prohibit the use of MPI calls from within signal handlers.

In multithreaded environments, users can avoid conflicts between signals and the MPI library by catching signals only on threads that do not execute MPI calls. High quality single-threaded implementations will be signal safe: an MPI call suspended by a signal will resume and complete normally after the signal is handled.

In short if we use LAM-MPI then we should stay away from the following signals:
SEGV,BUS,FPE,ILL,TERM,USR2

I have changed the code from using SIGUSR2 to using SIGCHLD (a signal that is currently ignored by default according to signal(7) manpage), and things are working much better.

Posted by hursejo at 01:01 PM | Comments (0)

MD Packages

  • Amber costs $400 for Academic use. It does come with the source code and demos, but is only shiped via CD.
  • Tinker does not seem to have a parallel implementation that I noted on the website.
  • NAMD uses Charm++ for parallism. According to this post NAMD can be compiled with MPI support but runs a bit slower.
Since the primary goal of this project is: For a very large molecule set, which may or may not be able to fit in memory on a single machine, harness parallelism to diffuse the load and incease performace. We have some prereq's for any MD module that we consider.
  1. Must be MPI compatable to fit with our framework (What about MP_Lite?)
  2. Must have a clearly specified, plain text input file formats that other modules can adapt to easily.

The goal of this post is to start the conversation about MD modules in F@C, and the requirements that new modules must adhere to in order to be classified as a potential module.

Posted by hursejo at 09:22 AM | Comments (1)

May 05, 2004

GROMACS 3.1.x and 3.2.x AltiVec Support

In order for GROMACS (3.1.x and 3.2.x) to build and use AltiVec instructions on PowerPC chips running Yellow Dog Linux/gcc 3.3.2 there are two files in the distribution which need a header file added to them.

In configure "#include <altivec.h>" should be added before main() in the generated C code in the AltiVec support test section. You can find this by searching for "supports altivec".

In include/ppc_altivec.h "#include <altivec.h>" should be added before the first function definition.

Posted by charliep at 06:28 PM | Comments (0)

May 04, 2004

Reconsile Database with Directory

So here is a short tutorial on how I currently do a b-and-t-g Database reconsile with the disk. First be farmiliar with two items:


  • b-and-t-g database
  • /cluster/project/b-and-t-g/results/

Open the Database and do a select in the form:
SELECT molecule,label,processes,cpus,nodes,finish_time from result where cluster_name = 'cairo' and molecule = 'villin' ORDER BY molecule, nodes, cpus, processes

for each cluster [cairo | bazaar] and molecule [villin | cut | pme | dppc].
In another shell you want to cd to /cluster/project/b-and-t-g/results/cairo/raw-data/.
Each folder is titled as:
MOLECULE-LABEL-PROCESSESonCPUS-NODES

For example:
villin-Gromacs-Optimal-3.2.0-37on20-10

is molecule villin, label Gromacs-Optimal-3.2.0, 37 processes, 20 CPUs, 10 processes.

You are looking in the finish_time field for the rows that you can clean. You are looking for rows that have a null finish_time or a finish_time of 1900-01-01 00:00:00 the latter indicates that the scheduler found an error and reported it, the former means that the run is stillin progress or never finished. As you are looking through the database you need to know the runs that are currently running because they will have nothing in their finish_time field, and if you delete this row then the scheduler will fail. If the run never finished then it can be removed from the database.
When you delete a row from the database delete the assocated directory in /cluster/project/b-and-t-g/results/CLUSTER/raw-data/
Once you weed out all of the runs with a finish_time of null or 1900-01-01 00:00:00 then you are finished.

Posted by hursejo at 11:19 PM | Comments (0)

Imaging Cairo from OSX

So here are some of my notes from imaging the Cairo nodes that were running OSX.


  1. Install a Minimal Yellow Dog Linux from a boot CD
  2. Copy a tar'ed version of SystemImager Client and the force-update.sh script in /root [located on all nodes] to the node you will to image.
  3. cd to systemimager-client directory
  4. ./installclient
  5. Don't run the prepareclient when asked
  6. cd ..
  7. ./force-update.sh
  8. Allow it to reboot
  9. To get ypbind to work you must set the NISDOMAIN=cairo.cluster.earlham.edu in /etc/sysconfig/network
  10. Then you must restart the network, and ypbind.
  11. Do a reboot and everything should be running.

Posted by hursejo at 04:26 PM | Comments (0)

MPI Error Handler

Here is a link to the MPI Form's section on the error handler:
Comm unicator Error Handler
We need to create a function that is using the type:


typedef void MPI_Comm_errhandler_fn(MPI_Comm *, int *, ...);

I have not played with typedef'ed functions with variable arguments, so I am looking for suggestions on how to actually implement a function of the above typedef for input into this function:

int MPI_Comm_create_errhandler(MPI_Comm_errhandler_fn *function, MPI_Errhandler *errhandler)

Posted by hursejo at 11:57 AM | Comments (3)

Chart of Runs

I have manually compiiled a list of the molecule runs that [have | have not | will not be] completed for both bazaar ad cairo.
Bazaar Cluster
Cairo Cluster
These are automatically updated from the database when you refresh the page. The key has changed a bit from previous iterations of this chart. I am working on a Time Approximation scheme to place on the page as well.
The pages list the runs in 4 catagories:


  • Type A:
    run = N(x) + C(x) + P(x)
  • Type B:
    run = N(x) + C(2x) + P(2x)
  • Type C:
    run = N(x) + C(x) + P(2x)
    run = N(x) + C(x) + P(2x-1)
  • Type D:
    run = N(x) + C(2x) + P(4x)
    run = N(x) + C(2x) + P(4x-1)
    run = N(x) + C(2x) + P(4x-2)
    run = N(x) + C(2x) + P(4x-3)

Where:

  • N(x) is x number of Nodes
  • C(x) is x number of Cpus
  • P(x) is x number of Processes

Posted by hursejo at 09:16 AM | Comments (1)

May 03, 2004

Meeting Notes - May 3, 2004

May 3, 2004 CCG Meeting - cp, jh, jm

Charlie still working on reading lists for 1/x, sqrt(x), and F@C.

Graphing tool. JoshM making some progress but when the same script is run under /cluster/cgi-bin it fails but works fine when in ~mccoyjo/cgi-bin it's fine. JoshH will take a look with him. If by Tuesday there is no solution send email to Charlie so he can look at it too. Presentation on Thursday to Vijay's group.

Specifications for the new graphing tool. X axis parallel structure, Y axis ps real/day, one color per molecule, data point at each parallel architecture. One graph per parallel structure: one process per cpu, two processes per cpu, one process per node, etc. JoshH will flesh this out.

Molecule runs are moving along. Bazaar still chugging, cairo idle now. JoshH will bring the chart up-to-date and split it into bazaar and cairo. JoshH will talk to Seth about recovering the cairo nodes.

mdrun/MPI, JoshM is making some progress. He will write-up a description and post it to MT.

This is a good time for Charlie to get questions answered by Vijay, et. al. Send them along post haste.

Villin/Urea molecule failure. Some progress, no real news yet. Charlie.

Implement catch the child failure, recover, and re-start with one less process logic and code in v0.5 of F-at-C. Making progress, figured-out how to keep the group from dying (MPI flag). Typedef'd functions and error handler, Charlie will look at this. JoshH.

Grant opportunity, Sun for hardware and money to port GROMACS to SPARC. Charlie.

Posted by charliep at 11:10 PM | Comments (1)