General Project

Description:

June 23, 2005

Moving to wiki

This mt blog is no longer active. We are now using a wiki for keeping todo lists and documentation. This blog still has good information that should be harvested and untilized in the future.

Posted by mccoyjo at 10:35 AM | Comments (0)

April 02, 2005

Running Molden and Molekel

Running Molden and Molekel on bazaar using X and X forwarding
  1. Add molden and molekel to your $PATH environment variable:
    • Add the paths in your ~/.bashrc file:
      export PATH=$PATH:/cluster/bazaar/software/molekel/bin/:/cluster/bazaar/software/molden/
  2. You need an X server running on your local machine.
  3. As the rendering applications are executed on bazaar, X forwarding is needed to transfer the application's graphical output to a remote machine (your local machine running an X server in this case). X forwarding is setup automatically with the -X option in ssh when connecting to a bazaar node (example: ssh -X b0.cluster.earlham.edu).
  4. Run molden or molekel in your ssh terminal. The graphical output of the application should be mapped to the X server on your local machine.
An alternative to using -X in ssh: Using $DISPLAY and xhost.
  1. Allow an X connection from a bazaar node on your local machine:
    xhost +b0.cluster.earlham.edu
    xhost + to allow connections from any host
  2. Set your $DISPLAY environment variable to the machine name (or IP address) and display number of your local X server:
    export DISPLAY=159.28.1.1:0.0
    note: The :0.0 is the display number. This value is typically :0.0 or :1.0.
  3. Run the application.
Posted by mccoyjo at 01:20 AM | Comments (0)

February 28, 2005

Gaussian Notes

Our installation of gaussian resides in /cluster/bazaar/software/g03.
A typical gaussian command line: nohup g03l md.com &
This runs gaussian with linda (g03 is the uniprocessor binary) using the input file md.com. Running g03l with nohup allows gaussian to continue running even if you log off the machine. The stdout and stderr will go to nohup.out. The g03l output will placed in md.log. If you execute the top command and watch gaussian run, it will call specific binaries to perform the molecular dynamics. These binaries are named lxxx.exe where xxx is a number. A list of the calculations each binary performs can be found here.

Gaussian relies on environment variables for most configuration options. Gaussian is packaged with environment setting scripts that should be sourced before running g03l (g03/bsd/g03.profile; it is handy to put this in your .profile). These are set with the export command in bash (export VARIABLE_NAME="value") and setenv in csh (setenv VARIABLE_NAME 'value'). Here are several environment variables that are handy:

  • GAUSS_SCRDIR="/tmp"
    • directory used for scratch files. Using /tmp is a good thing.
  • g03root="dir"
    • The base gaussian directory. Our base directory is /cluster/bazaar/software.
  • LD_LIBRARY_PATH="dir"
    • The path to linda shared libs (just points to $g03root in our setup). This is set in the g03.profile script provided by gaussian.
  • GAUSS_LFLAGS='-option1 "value1" -option2 "value2"'
    • '-nodelist 'b1 b2 b3 b4"' - Set the node list.
    • '-opt "Tsnet.Node.lindarsharg: ssh"' - Set linda to use ssh for authentication (the default is rsh).
    • '-mp 2' - The number of workers per node. [Note: this applies to all nodes in the nodelist. Since this method does not allow much flexibility, we may want to look for a better way to configure this.]

A suite of tests is included with gaussian (g03/tests/com). Running a set of these tests is a good way to test your gaussian environment. Keep in mind that the log file generated by gaussian is going to be placed in the same directory as the run configuration file (e.g. if you run "g03l $g03root/tests/com/test354.com" from your home directory, the test354.log file will be in $g03root/tests/com/).

Posted by mccoyjo at 01:05 PM | Comments (0)

February 25, 2005

Meeting Minutes - February 25th, 2005

Plumbing
Load scheduling with Lori
Documentation on Gaussian installation and use
Lori's sample job
hopper upgrade
Upgrade firmware on switches
Warewulf rather than ganglia (wulfstat or wulflogger) http://www.phy.duke.edu/~rgb/Beowulf/wulfware.php

Folding@Clusters
Non-nfs testing
a2 list

Posted by charliep at 08:23 PM | Comments (0)

February 18, 2005

Meeting Minutes - February 18th, 2005

non-nfs testing
  • temp user (home dir /tmp)
  • no release cut for first round of tests (take binaries from ~charliep/cvs-hopper/{cairo,bazaar}-a2rc/folding-at-clusters/release/bin/ )
  • /cluster/generic/bin/runfatc.pl (will be in cvs)
  • driver-fatc scripts
    • path
    • number of runs
    • min nodes
    • max nodes
    • molecule name
    • don't record in DB: yes = no record, no = record
  • non-nfs nodes; 3 tests 1 node, 3 tests 2 nodes, 3 tests 3 nodes,... walk up non-nfs config to more nodes
  • test set labels : start with a2 then move to rc, rc1, rc2, ...
  • put a2/rc test rows in results table
  • continually run tests on bazaar and cairo. Kill tests when we get a new rc. start tests with new rc when available.
  • dectect stalled runs by comparing time stamps to wall clock time
  • change to runfatc before we start tests: charlie will make a change (we're not harvesting the mass points; put attribute to store mass points in db table).
  • change dvc to use mass points in select as a select statement. could be a potential x-axis. we want to see how things scale by mass points instead of by noders (ps nodes/day vs mass points).
  • put calc-speedup.pl in /generic/bin/
  • use c1 and b1 as mother nodes when testing
folding@clusters
  • a2 and rc; check binaries to tell what version (rc, a1, ... etc) the we're using
Plumbing
  • make c15 smp again
  • /etc/rc.conf - use -h option when restarting portmap (look for portmap flags; bind to internal interface only)
Posted by charliep at 02:11 PM | Comments (0)

February 11, 2005

Meeting Minutes - February 11th, 2005

We're at SIAM this week finishing-up and presenting two posters, Folding@Clusters and Calculating 1/sqrt(x) for molecular dynamics
packages on commodity vector architectures. We'll post pictures,
etc. when we're done.

Posted by charliep at 05:31 PM | Comments (0)

February 04, 2005

Meeting Minutes - February 4th, 2005

  • General
    • Poster
    • Travel: 11a Thurdsay the plane leaves; leave Richmond 9ish. Return 5:30p on Tuesday.
    • Food and work Wednesday night at the ranch.
    • Stuff to take
      • projector
      • printer? (blow off if the resort has color printing)
      • gear bag (not whole thing; access point)
      • screen for projector
      • Poster tube (link in clustcomp archives)
      • Long jumper
  • Numerical Methods
    • converting GROMACS files to use with NAMD. Will document in cvs when done. see clustcomp m\ essage. John and Josh
    • Look at dihedrals (what is it and how does it manifest in our configuration files). do we \ have any? John
    • John's literature search is going well. More goodness to come.
    • Literature search - JohnS
      • inverse square root, molecular dynamics, vector MT Link
        Create a Literature Notes file in CVS (numerical-methods/doc/literature-search-notes.txt)
    • get bibtech from john's IEEE articles. John & Charlie
    • Charlie has info on 1/x^2 via Peter. He'll share soon.
  • Folding@Clusters
    • testing: good db of runs (platform, molecule, and node combinations). next is non-NFS test\ ing in /tmp/
    • always use .gro files for input so we don't have to deal with converting the pdb files
    • 3 flavors of failing. Let's start to think about how to catch these errors.
      • die to LINCS error
      • fail a bunch of times and max out restarts
      • activity stops, we use cpu, no new checkpoints being received, stalls in simulation
      • stalls during capability discovery
    • bug in capability discovery (netpipe); look at joshh's kludge
    • to catch failures, mother forks a process outside of the mpi world for the netpipe bandwid\ th test, mother has timeout feature on nannies and when triggered resets world.
    • list of molecules that consistantly fail. charlie
    • wrap mpi calls for value or kill to avoid race condition created by listen for message with\ certain flags.
    • DVC <-> results conversation. charlie & joshm
    • get ps/day information from mdrun output. get from log file on child 0
    • preserve log in a mother.conf or molecule.conf option
    • how many mass points in a system. can we tell how many mass points we have given our config\ fiels? wc -l on top file is a possible solution. John
    • package up a2 after siam for pande
    • 1-8 nodes on bazaar old image, 1-4 on new images testing done.
    • two types of testing a2 needs before release: Failure testing and Non-NFS testing.
    • F@C vs F@H - use b19 for x86 single cpu run
    • Failure and Recovery (with fault tolerance checking) - not done.
      • Look at folding-at-clusters/documentation/index.html
      • Killing lam on non n0 nodes.
    • SMP testing - Charlie will contact Henry Neeman and get the details for access to OSCER's SMP box to JoshH who will do the testing.
    • Review/Update protocol.txt - Revisit after SIAM. Make a phase diagram so we can figure-out how to test checkpointing, failure, and recovery completely.
    • Cleanup of files on non-NFS systems - Revisit after SIAM.
  • Plumbing
    • reimage all cluster nodes save 0th nodes.
    • cluster names in cexec
    • Gaussian working in the near future (next week)
    • hopper kernel parameters
    • move hdd out of bazaar golden client (b19) to another machine.
    • usb <> ps2 for cairo part of kvm
    • How to decide if /cluster/... vs locally on each node for a particular package? Let's discuss this in more detail after SIAM.
  • Other
    • Instrument NAMD and GROMACS to print out the x to find its range. Hard part is finding where to put the instrumentation in the code. Maybe GNUplot it.
  • Post SIAM
    • midwife - see jan 28th meeting minutes
    • Ganglia
    • JoshM will post a software list for us to review and update.
    • codeviz
    • cruise last 3-4 weeks of meetings for items lost in the pre-SIAM madness.
Posted by charliep at 09:50 AM | Comments (0)

January 28, 2005

Meeting Minutes - January 28, 2005

Folding@Clusters


  • Poster outline. Hopefully this weekend - Charlie

  • Molecular systems in /cluster/project/molecules are ready to use. nsteps
    = 500, checkpointing, Readme. Still to be done is to remove the other options
    that are causing lots of extra files to be created. Charlie.

  • Major Bugs. See source/TODO for these (review weekly).

  • A2 Release Testing - Running tests on 1 through 4 nodes for about 12 molecular
    systems on bazaar and cairo. So far about 500 or so runs.

    • NFS/Non-NFS - not done.
    • Different Mol types: - done.
    • Platforms: x86/Linux, PPC/Linux, PPC/OSX - done.
    • Number of nodes [2-16] - 1 through 4 nodes currently.
    • Failure and Recovery (with fault tolerance checking) - not done.

      • Look at folding-at-clusters/documentation/index.html
      • Killing lam on non n0 nodes.

    • Matricies

      1. NFS/Non-NFS Platform
      2. Mol. Types + Number of nodes + Platform
      3. Faulure/Recovery + Platform (high number of nodes)



  • SMP testing - Charlie will contact Henry Neeman and get the details
    for access to OSCER's SMP box to JoshH who will do the testing.

  • Less focus on code for the next 2 weeks, work on poster.

  • Review/Update protocol.txt - Revisit after SIAM. Make a phase diagram
    so we can figure-out how to test checkpointing, failure, and recovery
    completely.

  • Cleanup of files on non-NFS systems - Revisit after SIAM.


Numerical Methods


  • Nota bene - This section is unchanged from last week, which was unchanged
    from the week before that, ...

  • Basic MidWife - JoshM

    JohnS will send pointer to JoshM about what is there so far, and what
    needs to happen. [use the folding-at-cluster/testing directory]

  • Extensions to MidWife concept (new program and schema?) that supports
    building FFTW, GROMACS, etc. with particular configure and compiler/linker
    options. Table until next meeting...

  • Literature search - JohnS

    • inverse square root, molecular dynamics, vector
      MT
      Link


      Create a Literature Notes file in CVS
      (numerical-methods/doc/literature-search-notes.txt)

  • Poster outline - Charlie (this weekend)

  • NAMD Report: looking for benchmark molecules, Keep working on how to get
    Charm working - JoshM

Plumbing


  • Cairo image - F@C build problem, F@C run unknown, NAMD no, ganglia no

  • Bazaar image - F@C builds, F@C runs, NAMD no, GAUSSIAN no, ganglia no

  • DVC - results integration - JoshM and Charlie

  • New image notes

    • How to decide if /cluster/... vs locally on each node for a particular
      package? Let's discuss this in more detail after SIAM.

    • Postgres client libraries, binaries, DBI, and DBD::Pg on all client nodes.

    • JoshM will post a software list for us to review and update.

  • NAMD/CHARM - can run binary under OSX but can't install source and build
    under PPC-Linux. JoshM will try building on bazaar.

  • CodeViz stuff is installed on Bazaar /cluster/, JoshM will post instructions.
    Incompatibility between compilers, //tag and graphing. After SIAM.

  • Charlie has had two odd CVS experiences recently, TODO and gromacs. He'll
    look and see what if anything is going wrong here.

  • Cleaned-up DVC file spamming - JoshM.

  • distcc doesn't work with //tag option that GROMACS uses. Don't worry about
    it, distcc is really only useful for BCCD. JoshM.

Other


  • New copy of B-and-T GROMACS poster. After SIAM.

  • Mary Lou changed the plane ticket to have Josh's correct last name.

Posted by charliep at 09:49 AM | Comments (0)

January 21, 2005

Meeting Minutes - January 21, 2005

Folding@Clusters


  • Poster outline. Hopefully this weekend - Charlie

  • Review/Update protocol.txt - Revisit next week. Make a phase diagram
    so we can figure-out how to test checkpointing, failure, and recovery
    completely.

  • Major Bugs. See source/TODO for these (review weekly).

  • A2 Release Testing - Charlie has a simple framework and is running
    tests on 1-4 nodes for about 8 molecular systems on bazaar and cairo. So
    far about 200 or so runs. The only problems that have come-up are either
    known or fixed, at least so far.

    • NFS/Non-NFS
    • Different Mol types: All...
    • Platforms: x86/Linux, PPC/Linux, PPC/OSX
    • Number of nodes [2-16]
    • Failure and Recovery - with fault tolerance checking.
    • Matricies

      1. NFS/Non-NFS Platform
      2. Mol. Types + Number of nodes + Platform
      3. Faulure/Recovery + Platform (high number of nodes)



  • SMP testing - Charlie will contact Henry Neeman and get the details
    for access to OSCER's SMP box to JoshH who will do the testing.

  • Checkpointing - nxtxout (in number of steps) is the mdp file option
    that controls checkpointing. Charlie will update documentation/README.

  • Cleanup of files on non-NFS systems - Revisit next week.

Numerical Methods


  • Nota bene - This section is unchanged from last week.

  • Basic MidWife - JoshM

    JohnS will send pointer to JoshM about what is there so far, and what
    needs to happen. [use the folding-at-cluster/testing directory]

  • Extensions to MidWife concept (new program and schema?) that supports
    building FFTW, GROMACS, etc. with particular configure and compiler/linker
    options. Table until next meeting...

  • Literature search - JohnS

    • inverse square root, molecular dynamics, vector
      MT
      Link


      Create a Literature Notes file in CVS
      (numerical-methods/doc/literature-search-notes.txt)

  • Poster outline - Charlie (this weekend)

  • NAMD Report: look for benchmark molecules, Keep working on how to get
    Charm working - JoshM

Plumbing


  • Cairo image - can't compile GROMACS, haven't tried running F@C. Re-imaging
    is working and is easy. distcc works. ntp working. updated c3 doc with
    version 4 syntax.

  • Bazaar image - isn't imaging yet. JoshM will contact Skylar.

  • Keep Non-NFS nodes [Cairo c13-15, Bazaar will be b13-b15]

    New images on Cairo c9-12, and Bazaar b9-12

    First test of new Images:

    • Bazaar - F@C, GAUSSIAN, NAMD, ganglia
    • Cairo - F@C, NAMD, ganglia


  • CHARM - can run binary under OSX but can't install source and build
    under PPC-Linux. JoshM will try building on bazaar.

  • CodeViz stuff is installed on Bazaar /cluster/, Josh M will post instructions.

  • Humidity solution on it's way from John Walker. Not sure if it will
    be mounted in the ceiling with the HVAC or in the cluster closet.

  • GAUSSIAN on bazaar add to /cluster/

Other


  • Poster design meeting - 4p Wednesday

  • The hotel reservations are correct for SIAM CSE 05 now, checkin on Thursday
    and checkout on Tuesday (5 nights). According to Mary Lou the misspelling on
    the airline ticket can only be corrected when we get to the airport. They
    have put a note in the file about it, and we shouldn't have any problems, but
    I think we'll make sure to leave with plenty of time for the airport that day.
    JoshH, if you have a passport bring it.

  • IU tour - Yes, JoshH and Charlie will work this out later in the spring.

Posted by charliep at 02:32 PM | Comments (0)

January 14, 2005

Meeting Minutes - January 14, 2005

Folding@Clusters
  • Poster outline. Will progress this weekend - Charlie
  • Go through TODO file -- See next entry on F@C Design
  • Progress meter added via MPI between mother/child - joshh
  • Review/Update protocol.txt -- Revisit next week
  • Major Bugs:
    • Handle strings properly.
    • Command line option droping - COSM kludge "-c" thing.
    • Fault Tolerance checking
    • Cairo villin bug. -- joshh can demo
    • Check lamnodes for correct string (n0,n1,n2 vs n3.b32,n10) - strip whitespace, strip n, run atoi.
    • mdrun should have as an argument the LamHosts list (n1,n2,n5,n6) and command line generation.
    • CPU count in COSM for Linux
  • A2 Release To Do:
    • Testing
      • NFS/Non-NFS
      • Different Mol types: All...
      • Platforms: x86/Linux, PPC/Linux, PPC/OSX
      • Number of nodes [2-16]
      • Failure and Recovery
      • Matricies
        1. NFS/Non-NFS Platform
        2. Mol. Types + Number of nodes + Platform
        3. Faulure/Recovery + Platform (high number of nodes)
    • FATC.conf file for molecule names. Basic stuff, and Other files -- see F@C Design for details
    • See Bugs above...
Numerical Methods
  • Basic MidWife - JoshM
    JohnS will send pointer to JoshM about what is there so far, and what needs to happen. [use the folding-at-cluster/testing directory]
  • Extensions to MidWife concept (new program and schema?) that supports building FFTW, GROMACS, etc. with particular configure and compiler/linker options. Table until next meeting...
  • Literature search - JohnS
    • inverse square root, molecular dynamics, vector MT Link
      Create a Literature Notes file in CVS (numerical-methods/doc/literature-search-notes.txt)
  • Poster outline - Charlie (this weekend)
  • NAMD Report: look for benchmark molecules, Keep working on how to get Charm working - JoshM
Plumbing
  • New images for bazaar and Cairo, leave some subset of each functional until the new images are capable of running F@C.
    Cairo image setup is working. Hassan is working on Bazaar.
    Keep Non-NFS nodes [Cairo c13-15, Bazaar will be b13-b15]
    Image Cairo c9-12, and Bazaar b9-12
    First test of new Images:
    • Bazaar - F@C, GAUSSIAN, NAMD, ganglia
    • Cairo - F@C, NAMD, ganglia
  • c3 tools on all nodes working. send pointer to where these live for path - Josh M
  • Make b13-b15 Non-NFS - JoshM
  • GAUSSIAN on bazaar add to /cluster/
  • Send John a access point/NAT router. - Charlie
  • CodeViz stuff is installed on Bazaar /cluster/, Josh M will post instructions
Other
  • Review SIAM travel details:
    • JohnS is comming [up|down] Feb. 3rd
    • Charlie will fix date problems with Hotel reservation.
  • IU tour - yes JoshH and Charlie will work this out.
Posted by mccoyjo at 12:04 PM | Comments (0)

January 07, 2005

Meeting Minutes - January 7, 2005

  • General
    • Meeting next Friday 10a-1p.
    • Joshm will follow up with John to make sure he has a working environment.
    • New camera setup.
    • SIAM accommendations are good save changing typo.
    • Life will be good for all involved if we finish the poster before we get to Florida. Money for food; not poster.
  • Plumbing
    • Sytemimager: Joshm will email Hassan asking for some help in learning systemimager.
    • Work on general and 0th node docs.
    • Get 4 nodes with new images on each cluster for testing FATC. Joshm
    • Update node list. Joshm
    • Checkout CVS email notification on commit. Look at for node list and FATC. Joshh
  • FATC
    • Time on SMP box via Henry.
    • non-NFS testing on c13-15. Joshh
    • Poster roundup. Charliep
    • No threads in the near future.
    • Making MPI easy to install is imortant in the near term and until we can remove MPI.
    • Signals for checkpointing. Hook in mdrun. Check out old cvs for info. Joshh
    • Add heartbeat protocol between mother and nannies to see if the nannies are still functional. 5 minute timer in mother. When time has expired, we check nannies.
    • GROMACS 3.3 beta fixed errno problem.
  • Numberical Methods
    • Charmm installation.
    • CodeViz on mdrun.
    • NAMD doc keep moving. Joshhm
    • John literary search. John and Charliep
Posted by mccoyjo at 11:33 AM | Comments (0)

December 17, 2004

Meeting Minutes - December 17, 2004

F@C

  • JoshM will setup an environment for Charlie to do testing on integration
    bug. When this is fixed and checked-in a number of todo items come into
    play.

    Plumbing

  • In addition to Gaussian/Linda Lori also needs the following:

  • Super (and you should hear with that word, "wow, things are easier
    here")--please install Molden, version 4.2 (instructions at
    http://cmbi1.cmbi.kun.nl/~schaft/molden/howtoget.html--they are for
    version 4.0 (just change the 4.x in the ftp site) but I think 4.2 is out;
    if not, 4.0 or 4.1 is fine) and Moldenogl which goes along with it (at
    http://cmbi1.cmbi.kun.nl/~schaft/molden/opengl.html). The link to
    download it for Linux is about halfway down the page. You might need MESA
    too if it's not already on the system--there's a link for that at the top
    of the opengl page.

  • Molekel (current version 4.3) at http://www.cscs.ch/molekel/ (the
    website has annoying frames, but if you click on distributions you see the
    downloadable executables and libraries. There are several platforms--I'm
    guessing you want the Linux one. Brief instructions are under FAQ; longer
    ones under Installation.

  • JoshM reports that the images for bazaar and Cairo are making progress.

  • John needs to do a serious disk cleaning of scheajo and test*. We need
    disk space.

  • Everyone else needs to go cleaning as well (see previous meeting notes).

    Numerical Methods

  • NAMD is making progress, see JoshM's MT entry. He will put together
    documentation on building and running (in software/doc with an MT pointer).

    Other

  • Recruiting, Eric may be interested, will get back to us in January. We
    need two or possibly three depending on their backgrounds.

    Posted by charliep at 01:33 PM | Comments (0)
  • update

    • Plumbing
      • Gaussian requires TCP Linda. Although this is payware, it is included in the price of Gaussian. I could see no other dependencies in their docs.
      • I spent a large chunk of yesterday making coasters out of cdrs. Could someone else try burning the yellowdog images? I've been using the burner on admin and on my machine with no luck even with md5 verified downloads.
      • NAMD source installation is progressing well on cairo. I'm currently installing the prereqs (charm++/convers, VMD molfile plugins, TCL) The user documentation was informative for basic running info. Still looking into converting gmxbench to a format NAMD can understand.
      • Cleaned admin's hdd up a bit. /cluster is 93% full and could use some love.
    • General
      • Mary Lou caught me and I helped her for a bit with SIAM information. Signs point to a stay in a Disney resort.
    Posted by mccoyjo at 12:22 PM | Comments (0)

    December 14, 2004

    Meeting Minutes - December 14, 2004

    Next meeting Friday at 1p, JoshM and Charlie in Richmond, John in Chicago, JoshH?

    NAMD - Found source, haven't found molecular systems yet. JoshM will try building on bazaar and Cairo. We need to find either molecular systems for NAMD that are comparable to the ones from GMXBENCH or figure-out how to adapt GMXBENCH systems for use with NAMD.

    Imaging - New SystemImager installed, pulled one image from existing server

    Plumbing - Disk cleaning on admin or possibly a new bigger disk. Maybe the last of the HHMI money?

    Plumbing - /cluster needs to be cleaned as well. All of us should take a tour and clean old runs, etc. where possible.

    Lori's software - JoshM will track-down Gaussian/Linda (from Lori) for the new bazaar image.

    Toby's software - JoshM will track-down distcc and ccache for the new cairo image.

    Midwife - No report.

    F@C - Charlie needs to look at the error JoshM forwarded to him. Charlie will contact Vijay and see how it's going.

    SIAM - Charlie will make conference and hotel reservations next week for all of us.

    Recruiting - Application due-date is tomorrow.

    Posted by charliep at 11:57 AM | Comments (0)

    December 09, 2004

    update

    • Folding@Clusters
      • Have been answering John's question's about the midwife.
      • Problems with making mdrun a stand-alone binary. Charlie is helping out.
    • Plumbing
      • Successfully downloaded SUSE 9.1 and Yellowdog 4.0. We need more CD-Rs to use the iso's.
      • Installed the latest systemimager on admin and successfully pulled an image to it. Need to re-read and understand the problem with systemimager and dhcp 3.
    • Numerical Methods
      • Installed NAMD on bazaar in /cluster/bazaar/software/NAMD_2.5_Linux-i686/ (need to find the source; I could only find a precompiled linux-i686 version). There was no linux-ppc version, but there was on for OSX.
    Posted by mccoyjo at 07:15 PM | Comments (0)

    December 03, 2004

    Meeting Minutes - December 3, 2004

    Scheduling


      Busy two months coming-up. JoshH loose between Dec 18 - late January.
      Charlie gone Dec 8-12 and 14-16, working on this stuff otherwise until
      classes start. John working remotely starting after exams.

      JoshM busy this week and part of next on graduate school stuff. Couple
      more coming up in Jan/Feb.

      John is behind on everything (Charlie is too but doesn't admit it).

      Towards late January/early February we all (four) need to gather for a
      couple of days.

    SIAM CSE05


      Hotel, airplane, conference registration. (charlie)

      Two posters and one educational session to design and assemble.

    Numerical Methods


      NAMD - nothing yet (joshm). Once it's installed John and Charlie will
      figure-out the recipe for running our molecules with it (or theirs).

      Charlie has a paper that he will copy and distribute.

    Folding@Clusters


      New testing recipe, John will start writing mother.conf and using the ability
      to specify nodes on the mpirun command line. lam still needs to start
      and stop between each run.

      a1 release fails miserably with poly-ch2. Try straight-up GROMACS before
      we spend a lot of time on this. (john)

      testing results for release a1 can be found here:
      http://cluster.earlham.edu/detail/home/test2/results-cairo.html
      http://cluster.earlham.edu/detail/home/test2/results-bazaar.html

      Code changes - divorcing mdrun in test now. New developers documentation
      and changes to makefile.am, tabs and 80
      columns still need to be completed. (joshm)

      The To-Do list needs freshening. (charlie)

      When to do our major re-write? Before or after beta, may depend on the
      nature of the feedback we get.

    Plumbing


      Bazaar upgrade - SuSe 9.1 (including ccache and distcc). Maybe some
      help from Dawit.

      Cairo upgrade - YellowDog 4.0 (including ccache and distcc).

      SystemImager upgrade.

      As part of all this upgrading we should be able to finally have a complete,
      accurate list of all the 0th node issues.

      Node list - Toby is using c4, JoshM using c2 for the golden client, John using
      c5-c15 for testing. John using b5-b15 for testing.

    General


      Lunch today for the applied and research groups, need 2? Update - looks like
      we will have funding for 2 students next summer. Should we try to do
      recruiting now or wait until classes start in January?

      Gaussian - Lori Watson is ordering the right release, we'll run it on Bazaar.

    Posted by charliep at 10:17 AM | Comments (0)

    November 30, 2004

    Brief Meeting Minutes - November 30, 2004

    JoshM and Charlie met briefly, the current list looks like:

    JoshM


    • Install and try-out NAMD on bazaar and cairo.
    • Build a new bazaar image (with Dawit) using SuSe 9.1
    • Build a new cairo image using Yellow Dog 4.0 (including ccache and distcc).
    • Continue working on F@C code items.
    • Work with John on testing and the MidWife.

    John


    • Complete testing matrix using a1 for all molecules (GMXBench plus others), bazaar and cairo, 2-12 processors.
    • The MidWife.

    Charlie


    • Update prose for two SIAM CSE poster submissions. (done)
    • SIAM CSE conference registration for JM, JS, CP; JH are you going?
    • Hotel and air reservations for SIAM CSE

    • Organize F@C a2 release.

    Posted by charliep at 02:26 PM | Comments (0)

    November 14, 2004

    Work Log

    Folding@Clusters
    • The MPI_Recv to MPI_Irecv problem highlighted int the mother is a non-problem. This is expected behavior between the mother and child. This can be taken off the ToDo list.
    • I added the ability to correctly detect the number of CPUs on a Linux machine in COSM. i tested on Bazaar and Cairo.
    • I cleaned up the "step" messages generated by mdrun. These should be collected by the nanny and transmitted to the mother for a status display.
      Finished Step 10 of 1000, remaining runtime: 90 s
    General
    • Created a new feel for the website. Let me know what you think, and if I should push it.
      See the link in the email, let me know if you need it again [I don't publish that private work space to keep it from being sniffed by spiders...]
    Posted by hursejo at 09:48 PM | Comments (0)

    November 12, 2004

    Meeting Notes

    Folding@Clusters
    • Short term code fixes (joshm)
      • Divorcing child and mdrun.
      • diff the configure.ac, makefile.am, and (?) to see changes and to preserve our FATC flag.
      • See if altivec issue is patched in the latest version of gromacs. If not, fix it.
      • gromacs version - is the highest production 3.2.1?
      • Don't do another code root; just roll back changes in CVS.
      • Changes to make in gromacs: remove new main in mdrun, linking instructions, base level Makefile.am will have to change (has dependencies on the child binary).
      • We need new build instructions.
      • Tabs and 80 cols.
    • Run muliti-node tests on all platforms. (john)
    • Update node usage doc with by giving all non-NFS nodes to John.
    Posters
    • Mount on board like the physics posters?
    • Print poster content at appropriate size.
    • Option: Laminate the poster?
    • Two new generic framworks need to be created: FATC, Numerical Methods for SIAM
    • MERCK - Tuesday the 16th at 6-9p(?)
    • Dr. Dobbs nov 2005 issue - write article about FATC. Due in mid July. We need to start writing very soon.
    Numerical Methods
    • Test a1 binary on c12-c15. Run as a test user.
    • Results table.
    Plumbing
    • Making the cluster space usable:
      • Ask John Howell about shelves.
      • Mount screen and projector.
      • Vacuum
    • Data Visualization Console: temp file flooding and errors when running.
    • Midwife write-up (Charlie)
    General
    • HHMI annual accounting and report (Charlie)
    • Talk to Mary Lou about travel logistics for SIAM. Start a calendar, get details down, talk to Joshh.
    Posted by mccoyjo at 11:13 AM | Comments (0)

    November 04, 2004

    Meeting Minutes - November 5, 2004

    Folding@Clusters
    Design handout for SC2004

    Discuss legends for clusters on the diagram

    Numerical Methods

    Plumbing

    Papers and Presentations

    Items from JoshM's email of Wed

    Posted by charliep at 05:25 AM | Comments (0)

    October 29, 2004

    Meeting Minutes - October 29, 2004

    F@C


      Diagrams - Make three on one, with one legend. JoshM has notes with the details.

      Our poster was accepted at SIAM CS05.

      Testing


        John will start working with the A2 release (in binary form still) this weekend. Consider building from source and collecting more detailed data after SC04.

        Midwife design in Pittsburgh.

      Code inspection - print 2 up 2 sided 4 copies on Friday before leaving for SC04. Do the inspection in Pittsburgh over food and wine.

      A2 release - most items complete, we may get one or two more done, see source/TODO for the details.

    Numerical Methods


      Testing with source builds of F@C (after SC04). See paper notes for the details. John.

      C code to PeterB. Charlie.

      Publish the material that PeterB collected. Charlie.

      Re-organizational meeting soon after SC04.

    Papers and Presentations


      F@C, 1/sqrt(x), and the education program were all accepted at SIAM CS05

      Merck - November 16th in Noyes Hall. Presenting B-and-T-GROMACS, JoshH is going to re-print the poster. JoshM will submit the abstract to Nathan.

    Plumbing


      Switch for cairo or a UPS? We need to make this decision soon. Start a large load on cairo with F@C (JoshM) and see if we can start making the switches crash again. If so, try upgrading the firmware to see if that stops it.

      Ordered a full-duplex speaker phone of the ReCompute/Cluster lab space.

      Setup fatc-dev and fatc-user @ cs.earlham.edu. Charlie will speak to Skylar.

      Is Bugzilla ready to go? Right list of platforms? JoshM

      Update the node usage list. JoshM

    Posted by charliep at 10:45 AM

    October 28, 2004

    Update

    Folding@Clusters

    grompp error codes
    Gave all fatal_error() calls in gromacs a unique error number > 0. All calls were given error numbers because grompp includes most of the libraries in gromacs. This buys some flexibility with other error dectection. A list of the error numbers, their text message, and the file in which they reside can be found in folding-at-clusters/documentation/gromacs-errno.txt

    mdrun exit
    first attemp:
    use #define F_at_C's to block out calls to fatal_error. Instead, use fatc_exit() to get error code, log message, shutdown mpi, and cleanly exit. The fatc_exit() should be in child.{h,c}. This works because child and mdrun are the same binary.

    second attemp:
    -Static linking problems. CHILD symbol wasn't found. Now we will set CHILD global to TRUE in mdrun.c and set to FALSEin grompp. Make exit.o to link with mdrun and grompp that holds the F_AT_C_exit(). Test by keeping grompp executable and rebuilding child (and thus mdrun).
    -Created exit.{h,c,o} to house F_AT_C_exit(). Added target to Makefile.
    -exit.o needs to be linked with gromacs via configure/make
    -after linking, F_AT_C_exit() needs to be put in fatal.c

    Purdue FATC Image
    folding-at-clusters/posters/fatc_diagram{,2}.{fig,png}.

    GMXLIB -> FATCLIB
    -changed getenv("GMXLIB") to getenv("FATCLIB") in gmxlib/futil.c and kernel/topio.c.
    -it works and our documentation was altered to reflect the change.

    Updated source/TODO.

    General
    -Registered for SC2004

    Posted by mccoyjo at 06:26 PM

    October 22, 2004

    meeting notes

    Folding@Clusters

    Gromacs output is complete save a few stderr prints. John may need to change how he is testing to accomidate.

    Signal issues have been taken care of.

    We would like some testing before releasing a2.

    Grompp error return codes are coming along nicely.

    Work still needs to be done on mdrun error return codes. Who is handling this?

    TODO needs to be updated.

    GMXLIB -> FATCLIB is claimed by Joshh.

    Get F@C running under gentoo. Joshh

    Testing

    John's testing has found no major issues. He still cannot run molecules with .ndx files. It seems that almost all of the molecules work on the vast majority of configurations.

    Fixed villin-urea files (the itp files were missing).

    Testing will continue after a2 source is finished and before a2 is released.

    Posted by mccoyjo at 06:06 PM

    October 15, 2004

    Meeting Minutes - October 15, 2004

    F@C


      Fixed the signal overlap problem between COSM, GROMACS, and F@C. JoshM.

      Lost signal problem between nannies and children, fix by changing to a polling architecture. JoshM.

      New exit() architecture and returned error codes for mdrun and grompp. JoshM.

      printf() statements in grompp and mdrun. JoshH will look at this and then get in touch with Charlie to discuss solutions. JoshM suggested we check-out COSMs distributed file system as a potential solution to this. Would this be helpful for the logging? Maybe use mdrun/child and grompp logging to catch output, with useful return codes we just need this to put in a log.

      Next release will be a2, probably early next week (after next round of bug fixes and testing). Charlie.

      Vijay call. Charlie will talk to him about specifying input and output files sometime next week. Probably using a conf file that describes what to look for and what to generate.

      New molecules, JoshM will get the ones in ~pande and set them up.

    SC04


      Hotel for Fri-Tue night, JoshM, JoshH, and Charlie.

      John is looking into travel.

      Take the projector, screen, CS department sab rep ads.

    Numerical Methods


      Compare same molecule with GROMACS and F@C to see what the overhead is. John and JoshM.

      Record testing results in folding-at-clusters/testing/umm.html. This same directory should have any scripts, documentation, etc. related to testing as well. John.

      Check-out a particular tagged release of F@C and build from source to do the performance analysis (rather than using GROMACS directly as before). John.

      Generated 1/sqrt(x) C code for PeterB. Charlie.

    Merk


      November 16th, 7p to 9p.

      Reveiw B-and-T GROMACS poster, JoshH will re-print in Bloomington and bring to Richmond.

      Use F@C materials from SC04.

    Plumbing


      Skylar will take b16 and make hopperprime.

    Posted by charliep at 11:53 AM | Comments (42)

    October 08, 2004

    Meeting Minutes - October 8, 2004

    Today's meeting was cancelled due to the recent release (a1). We all have well defined tasks to work on so a meeting isn't necessary.

    Posted by charliep at 11:56 AM | Comments (22)

    October 01, 2004

    Meeting Minutes - October 1, 2004

    Charlie has the paper notes from this meeting.

    Posted by charliep at 03:14 PM | Comments (11)

    September 30, 2004

    Update - Josh H - Sept. 30, 2004

    Regard this entry as fluid if accessed before Sept. 30, 2004
    Worked on: Folding@Clusters
    • Mother/Child [Rank 0]/Nanny[Rank 0] now print a banner when they start which indicats the version of the code they are running, both Tag and Revision from CVS.
    • Implemented a command line option that just prints the tag and revision from cvs [in a pretty format] to the command line, then exits. This is implemented for mother/child/nanny binaries.
      ./mother --version
      ./mother -v
    • I cleaned up the chattyness of the print statements by adding a logging level argument to the Print command, and taking a tour through the code to put the correct print statement in the right log level.
    • Nanny no longer sends the same checkpoint file multiple times. The decision is based on the size of the file. If the file did not change size since the last time I sent it, then don't send it again. Since the file always grows (accumulates data), this is a good metric to use.
    • Implemented switched sections for using assignment server OR local testing directory.
      Uncomment the tag USE_ASSIGNMENT_SERVER in mother.h to toggle this option. Currently this is a stub if enabled, since we don't have the assignment server stuff.
    To Do:
    1. Clean up error checking WRT conf files, grompp/mdrun errors, etc. Useful error messages.
    2. Mother/Child/Nannny should clean up after themselves
    3. Have child send the result files to the mother when finished. Put in the infrastructre for when we know exactly the number and type of files to send.
    Posted by hursejo at 02:23 PM | Comments (52)

    September 24, 2004

    Meeting Minutes - Sept. 24, 2004

    Folding@Clusters
    • Can we put a logging level entry in mother.conf? We shouldn't use a command line option because they may not be very portable.
      Should we implement --version in mother, child, nanny or should we always display this no matter what? The latter means that the are more portible to systems without command lines (Windows?) [Josh H]
      Clean up the chattyness (word?) of the display. Possibly add an extra int to the Print function.
      --version in both places [command line + exit, always print].
    • Need to put in a copyright and developer info [Charlie?]
    • Josh M has F@C running.
    • Should we put a moleculeDirectory entry in mother.conf for use during testing or just leave it hardcoded? [Josh H]
      One way it could work would be if moleculeDirectory is present and populated in mother.conf than the molecule comes from there, if not then we go to the assignment server.
      Keep this as compile time option is the best case to keep users from this flag. use defines to switch between the two.
    • Tabs and 80 column tour through the source code. [Charlie]
    • It appears that molecules use different sets of input files, both in number and type.
      • How should we handle this?
        Assignment server should send us a list of filenames to create, then download them all.
      • Do the command lines we issue to grompp and mdrun need to adapt? Yes. We need to adapt to how they want to run. 2 classes of input, via input and output flags. Josh M can elaborate.
      • We need to get a list of what they want ran, and what they need back to the server.
    • Mother/Child/Nannny should clean up after themselves [Josh H]
    • Testing
      • The README in testing needs to be updated and a grid of tests to be performed should be developed. [Charlie]
      • JohnS needs a tour and list of testing to do starting RSN. If JoshM can give him the tour, then CharlieP will get a list of tests to perform ready.
    • Is the molecule repository complete and correct now? Complete with pathless cpp, checkpointing, all original files, all molecules present, etc.? [Josh M]
      Yes, except for project-1012. We will incorportate the new molecules as they come in from Stanford.
    • Hopper immutable function is chflags.
    • Checkpointing and restarting with same number of nodes. [Josh M]
    • Instructions for developers for making a release. [Charlie]
    • Timezone problem with COSM. [Charlie]
    • CPU count detection problem with COSM. [Charlie]
    • Building and running correctly with AltiVec instructions on PPC. [Charlie]
    • Clean up error checking WRT conf files, grompp/mdrun errors, etc. Useful error message. [Josh H]
    • Nanny should avoid sending the same checkpoint file multiple times. [Josh H]
    • Have child send the result files to the mother when finished. Do we know exactly the files that we need? [Josh H]
    Conferences
    1. LinuxFest
      • Review the presentation outline from Grab-a-Byte and get your feedback to CharlieP. We should talk about this next week. [All]
        John is not going to attend this with us.
    2. SC2004
      • The projector, table-top screen, etc. arrived on Thursday. Purdue University is going to give us space in their booth and possibly a presentation slot. Next week we should discuss the logistics of the trip.
        Why not Indiana's Booth?
    General
    • MT updates pre-meeting.
    • Plumbing: Bugzilla setup on admin. [Charlie]
    • Logistics: Left over money ideas: power supply/backup infrastructure -- In light of the situation the other day with the power outage.
      Remote console would be nice, but the first point is more important.
      Plane tickets to California for F2F meetings at Stanford.
      Food is always good
    Posted by hursejo at 07:35 AM | Comments (86)

    September 23, 2004

    Update - Josh H - Sept. 23, 2004

    Worked on: Folding@Clusters
    • Fixed One Node problem with network capability discovery.
      Should note that is problem may or may not arrise depending upon network configuration. On cairo I was able to execute the test fine using just n0 [or the localhost] however this case could fail depending upon the platform. I put in a check that if we were running on one node then we don't run the network tests, and just zero out the results for those two items.
    • Tested on non-NFS cairo, and posted a few notes in a previous message.
    • The grompp shared library issue has been fixed with an environment variable. This is also in the previous post to MT about non-NFS systems.
    • Tested on Bazaar [NFS and Non-NFS], everything was fine. The Non-NFS takes a while to complete the network capability discovery stuff when working with the Hub in bazaar annex.
    To Do List
    1. Clean up error checking WRT conf files, grompp/mdrun errors, etc.
    2. Nanny should check to make sure that the checkpoint file has changed since the last time that it sent it to the mother. This is to avoid sending the same file multiple times.
    3. Have child send the result files to the mother when finished. Do we know exactly the files that we need?
    4. Implement --version [--help as well?] in mother [consider child and nanny as well]
    5. [With Charlie] Fix the altivec linking issue with the child on PPC/linux. Does this happen on PPC/Mac OS X?
    Posted by hursejo at 10:05 PM | Comments (51)

    September 17, 2004

    Meeting Minutes - Sept. 17, 2004

    Folding@Clusters
    • Do a checksum before using any plain text files, to ensure no bad things happened since we downloaded them.
    • Fix One Node problem with network capability discovery.
    • Restart command and logic in code. Top of Josh M's list.
    • Clean up the /cluster/molecule directory. Josh M reports that he is making progress.
      Make read-only and immuteable
      Josh H gave him the Villin-Urea stuff and put it in ~joshh/dev/molecule
    • grompp shared issue (#3 for Josh M). Look for the following files at minimum
      release/work/mother/topol.top:11:23: ffG43a1.itp: No such file or directory
      release/work/mother/topol.top:2494:19: spc.itp: No such file or directory
    • CPP (C pre-processor) in mdp configuration. [#4 Josh M]
      How does F@H do this? They may just use option 1, can we confirm this.
      1. Why hard code the full path? make this just "cpp"
      2. If we have to hard code the full path, document to the user that they have to edit the mdp file before running.
      3. Look for cpp on thier system, and edit the mdp file.
    • Josh H pushed the configure.ac file that charlie made to CVS.
    • Node map to users: File in generic/doc/node-usage.html
      Keep this up to date.
    • Josh M & Charlie need to build and Run the folding-at-clusters stuff on, at least, cairo. To get into the change-build-test mantra.
    • Charlie is working on documentation for install.
    • Packaging Binaries:
      1. Decide OK
      2. Tag
      3. checkout tag
      4. Build
      5. Make tarball (out of release directory -- cleaned, meaning no working directories or files)
      We can get rid of the release directory in CVS. The makefile can create this for the developers. [Charlie]
    • See mail message about expected runs
    • TimeZone & detecction of CPUS in COSM
    • Press:
      • Web presencce
      • 1 Page overview
      • T-Shirts?
      • Poster
    General
    • Fix consistancy when make'ing on Bazaar. check autoconf, automake, etc...
    • There was a bit of ownership changes as a result of Josh M's script using [from root]:
      chown -R :users *
      on your directory should fix any residual problems.
      CVSROOT has been fixed by Josh H. Someone else should check security here, just to make sure.
    • Post F@C Beta:
      • Bazaar and Cairo images all nodes. This involves getting the systemimager server running. Think Consistancy.
      • --version in mother
    • Need to finish up presentation for Ohio LinuxFest
    Posted by hursejo at 10:10 AM | Comments (32)

    September 16, 2004

    Update - Josh H - Sept. 16, 2004

    Folding@Cluster
    • Fixed the child binary on cairo by no linking in the altivec assembly code.
      This needs to be corrected, but for now we have a working setup for PPC.
    Pending
    • Test code on Bazaar NFS
      there may be a problem with the discovery code on x86.
    • Test code on Non-NFS Bazaar and Cairo
    • Nanny should check to make sure that the checkpoint file has changed since the last time that it sent it to the mother.
      Now it sends the file every time it pings the child, which leaves open the opportunity that we send the same file twice to the mother.
    • When Child is finished and ready to report it should:
      1. Notify the mother that the children are finished
      2. Mother then askes for the file(s)
      3. Child transfers Final files to Mother via MPI
      4. Children are released
    • Have the the midwife (what will be the F@C Server) to send grompp, child, nanny to the mother via HTTP.
    • Fix the altivec linking issue with the child on PPC/linux. Does this happen on PPC/Mac OS X?
    Posted by hursejo at 12:35 PM | Comments (49)

    September 10, 2004

    Running GROMACS

    It seems most of our runs (including the scheduler and some of the command lines in the gromacs overview doc) were falling for inconsistant command line flags between grompp and mdrun. The '-c' flag deals with .gro files generally, but does different operations depending on the executable. grompp uses -c flag for input while mdrun uses it for output. Simply put, we are re-writing our .gro input files with .gro output from the simulation.

    Things used to look like this:
    o grompp -f grompp.mdp -p topol.top -c conf.gro -o villin.tpr
    o mdrun -s villin.tpr -o villin.trr -c conf.gro -g villin.log

    When they should've looked like this:
    o grompp -f grompp.mdp -p topol.top -c conf.gro -o villin.tpr
    o mdrun -s villin.tpr -o villin.trr -c output.gro -g villin.log

    Realistically, we should cut the -c option from mdrun unless we determine .gro output is the way to go.

    Posted by mccoyjo at 11:54 AM | Comments (72)

    Meeting Minutes - Sept. 10, 2004

    Numerical Methods
    • With Josh M's solution to the gromp/mdrun file, John (as well as the rest of us) need to revisit all previous tests that use the previous setup.
    • Josh M will do a find/exec on the cluster file system, searching for mdrun or gromp, marking these as potential failures.
    • Josh M will post the grompp/mdrun info from the mail message to MT. John will pick this up and change all the necessary things.
    • John Needs to checkout the gromacs source from CVS via
      cvs checkout gromacs-3.2.1
      Do not use the --enable-fac flag in the GROMACS configure script.
    • Place known good copies of the molecules in to the molecule repository Read Only.
    Folding@Clusters
    • Discovery: Use COSM for everything that we can. Write our own code for network stuff.
    • Makefile fixup is Charlies token, Josh H will stay off it until the go flag is set.
    • Charlie will add mdrun.o to the list of generated object files.
    • Josh M has notes about how to get rid of generated pedantic errors
    • In folding-at-clusters/source/Makefile add "-lm" to the F_AT_C_CFLAGS list.
    • Remove the "charliep" hardcoded stuff from the includes for GROMACS in mother.h, and child.h
      the child.h file is the only file that needs it since we no longer link grompp into the mother.
    • Josh H cleaned the /cluster/project/folding-at-clusters
    • Josh H is going to push his folding-at-clusters repository, and Charlie is going to push the gromacs-3.2.1
    • Running directory Structure.
      • log
      • work
        • mother
        • child
        • nanny
      • bin
      • conf
        • mother.conf
        • nannyHost.conf
        • childHost.conf
    • folding-at-clusters
      • source
      • documentation
      • release (empty directory -- populated by make release in source directory)
    • Josh M is waiting for a mail message from Josh H.
    • Religion:
      • All code and comments, where possible [not function calls], to 80 columns.
      • Use tabs, not spaces when editing.
    • COSM -NO-THREADS ? Take out due to HTTPD on mother which needs at least x2 threads for max load.
    • Josh M is going to work on the restarting in folding-at-clusters
    • Talked about packaging. Primary consern developers build binaries, testers use binaries to simulate user experience (testers use tools to ensure the tagged code is current). Two/Three phase tagging (v1-29, v1-30, tester bless, v2-0, ...). Charlie will post more details.
    Posted by hursejo at 10:07 AM | Comments (144)

    September 04, 2004

    Building GROMACS

    Generally you should build GROMACS using ./configure, make, etc. rather than a Perl script. This gives you control and feedback often needed.

    There are three variables that we generally need to modify when building GROMACS: CFLAGS, CPPFLAGS, and LDFLAGS. CPPFLAGS and LDFLAGS can be modified through the shell's environment before calling ./configure, CFLAGS must be modified in acinclude.m4 and then the configure environment must be rebuilt.

    The following three step procedure sometimes does more work than required but it always produces the correct result (or an error), rather than silent failure.

    1) CPPFLAGS and LDFLAGS are normally used to specify the location of the FFTW include files and libraries.


      export CPPFLAGS="-I/cluster/cairo/software/fftw-2.1.5-Baseline/include"

      export LDFLAGS="-L/cluster/cairo/software/fftw-2.1.5-Baseline/lib"

    2) CFLAGS is normally used to control profiling, debugging, etc.


      Edit acinclude.m4, near line 770 you will see where to modify xCFLAGS.

    3) Rebuild the configure environment, configure, and build.


      make distclean
      aclocal; autoheader; automake; autoconf
      ./configure [configure options]

      Check Makefile to be sure it looks like it should WRT CFLAGS, etc.

      make -j 2
      make install (optional)

    Posted by charliep at 10:54 AM | Comments (60)

    September 03, 2004

    Meeting Minutes - Sept. 3, 2004

    Numerical Methods
    • Default (with SSE) & without SSE (both bazaar and cairo) time is from running the code without profiling data.
      Rerun those 4 sets to fix this issue.
    • Add innerloops per protocol from earlier meeting minutes. (~= top 2, over 20% always stays in list)
    • Figure out gprof in the context of the output files in the chart.
    • "compiler flag" => "option"
    • Research how to use a perl script to automake the chart data.
    • Line items:
      • Use the compiler option to Not Unroll loops. (man gcc, look at CFLAGS in the scripts). With -pg this may be turned off by default (95% confirm).
        Want to add fno-unroll-loops option to compiler CFLAGS
    CFLAGS, GROMACS, and configure Problem
    • NB See Building GROMACS entry for the fix to this CFLAGS problem.
    • Pitfall Compiler options: If !CFLAGS set then get set A, if CFLAGS set the get set B. Where A != B. Research uniformality.
    • We want the -pg option to be appended to the default setup, not replacing the default setup.
    • Replicate fix in 'all the right places' WRT multiple platforms.
    • Numerical Methods: Think about this and report back with new ideas.
    • B-and-T-GROMACS: table for now, but revisit later.
    • GROMACS WRT F@C: Edit configure, and add everything we need directly into the script. (FFTW, Optimization Options speccific to ppc -- others archs? --, )
    Running GROMACS
    • Figure this out! Use one of the following 2 options:
      1. How to use commandline options to preserve .gro file, Not overwriting the orignal. (Josh M)
      2. If that is not possible, Rework every script to get the fresh molecule set everytime we run grompp/mdrun (cvs update -C ? Must overwrite the file you are using with the good file in CVS).

      We need to re-run the NM tests to behave according to the above.
    F@C
    • See TODO Notes
    Posted by hursejo at 09:32 AM | Comments (118)

    September 02, 2004

    Update - Josh H - Sept. 2, 2004

    Folding@Cluster
    • Extracted grompp from the mother. It is now executed as a separate binary.
    • Checkpoints are now given to the nanny upon the request of the mother.
      Details The Nanny notifies the Mother whenever it has a checkpoint. When the mother decides that it wants to pull said checkpoint it tells the nanny to send it.
    • The mother will keep a backup of the previous checkpoint in case we need it. The checkpoints are placed in the mother's molecule directory.
    • Restarting works in the Basic setup: Restart from the beginning.
      I am waiting up on a final decision about how to restart properly before implementing it.
    • Removed the Irecv and Isend's in the mother and nanny and replaced them with Iprobe calls.
      This makes more sense for the situations in which we use them. Also it is safer in our context.
    • Instead of decrementing the number of children used upon every restart, we now try to restart with the same number of nodes N times (where N = 2 for now) before decrementing the number of children.
    • Fixed bug: Nanny used to try and send a non-existant checkpoint file to the mother. Checks are now in place to ensure that the file must exist and be larger than 0 bytes.
    • Transfer gro and tpr files via MPI instead of via HTTP between child and mother.
    Questions
    • What files should the Child send to the Mother when it finishes computation? Just the tpr file?
    Pending
    • When Child is finished and ready to report it should:
      1. Notify the mother that the children are finished
      2. Mother then askes for the file(s)
      3. Child transfers Final files to Mother via MPI
      4. Children are released
    • Have the the midwife (what will be the F@C Server) to send grompp, child, nanny to the mother via HTTP.
      We can place them in the root path directory for now. We may want to put them in another directory [bin], but F@H just dumps them in the working directory and that is the least overlead
    • Remove all hardcoded values. There are a few -- search for joshh
    • Test on NFS Cluster
    Posted by hursejo at 09:00 AM | Comments (75)

    August 31, 2004

    update

    Folding@Clusters

    • Found a new method for restarting that is one call to grompp. Checkout the restart post.
    • The latest cut of the code looks great. Once the canonized restart method is verified, it will be easy to integrate with F@C.
    • Learned DBI and IO::sockets to use in the midwife.
    • We have a molecule testing tool in the tools page that is connected to a database that is very useful for the midwife. It stores mocule names, directories, and gives a molecule id. Connecting a midwife table seems to be the way to go.
    • I've been having a hard time getting F@C to compile. More to come after I've explored this more thoroughly.

    Todo List

    • plumbing
      • ssh without nfs.
      • finish Images.
      • fix system imager on admin. The version was updated and it stopped working.
      • Status check on Athena.
      • Fix KVM wiring problems.
      • System imager on gentoo. Maybe newest version will work.
      • Checkout async NFS.
      • run DBI::proxy daemon on hopper.

    • Folding@Clusters
      • Reaper
      • Work on todo items without stepping on Joshh's toes.
      • Place mechanisms for wall time, number of restarts, and number of nodes in F@C for testing with the midwife.

    Posted by mccoyjo at 10:06 AM | Comments (20)

    August 29, 2004

    John - Sunday update

    I have been slowly filling in the table. I find it is easy/alright to keep runs going while doing other homework. I haven't done any planning ahead or looked at the big picture I am so busy assembling. Yep, that is about it.

    Posted by schaejo at 08:05 PM

    August 24, 2004

    Meeting Notes - Aug. 24

    General:
    • SC2004: Charlie found money to cover everything 'else' in addition to the Student Program. Josh H will register when more is known about his schedule.
    • SIAM: Need to get Num. Methods abstract in. 5 authors. F@C Abstract will go as is.
    • Ohio LinuxFest: Need to only register John. Josh^2 and Charlie are already registered.
    F@C:
    • Restarting: Josh M has the method. He believes that the program does all the checking to make sure we don't propagate errors.
      Check with Pande Labs about the accuracy of our method. Point them to the MT entry. (Josh M, Charlie P)
    • Testing:
      Need a piece of script to:(midwife??) HTTP communications, this should be a separate proccess
      1. Get next molecule that has not been run from the Database: TestID, Molecule Name, names of the conf/mdp/etc files
      2. Put those files on the Annex
      3. Start mother (not the process)
      4. When mother is finished running, extract running information and put it into the database. Number of restarts, size of universe (number of nodes), wall time from molecule start to molecule end, in the future transmit the tpr file.

      Random Reaper: Randomly kill an mdrun process -- Gremlin -- Coroner.
    • Fully implement error handling
    • Annex: Test Users (test1-3).
      Will add documentation about ssh key exchange on non-NFS system.
      LAM is installed on all nodes.
    • Scheduler (B-and-T-G) should be perl additional scripts.
    • Need to pursue struct to pass around arguments in mother/child/nanny
    General
    • Tobias: Charlie will chat with him WRT to the Group
    • NFS Async option may be an improvement boost.
    • Regular weekly meeting on hold until Josh H has a definite schedule.
    Posted by hursejo at 02:03 PM | Comments (33)

    update

    • Plumbing
      • Cairo and bazaar annexes are ready to go. Details at meeting.
      • Gave the complete list of changes to the new annex nodes.
    • Folding@Clusters
      • See my earlier mt entry for the latest details on restarting.
      • Completed more code review.
    • General
      • I plan to increase my presence in Dennis for both accessibility and focus issues.
      • John and I have talked about the student volunteer option and SC2004. Coordination on that front is happening.
    Posted by mccoyjo at 12:57 AM | Comments (68)

    August 22, 2004

    Schaefer - Sunday Update

    I'm back. When is our next meeting?

    Posted by schaejo at 11:56 AM

    August 17, 2004

    Meeting Notes - August 17, 2004

    General


    • Dawit is headed to Columbia for the 3-2 program in computer engineering. He would like to keep working with us remotely, we will need to identify tasks which are easily partitionable and trackable.

    • Recruiting, start looking for a sophmore or junior, CS/math type.

    Folding@Clusters


    • Charlie has GROMACS with -O2 working under PPC now, trying for -O3 and possibly -O4 next. The default for GROMACS is -O2 which causes seg faults under PPC.

    • JoshH has seen the light, we'll be using the native GROMACS/MPI capabilities for distributing work to the children.

    • Checkpointing - GROMACS can save checkpoints at periodic intervals. This happens on the 0 rank node which isn't necessarily on the same node as the mother. Need to have a mechanism to move that file from rank 0 child to mother and test the viability of it. JoshM is still looking for the right method for restarting with a different number of nodes, he'll send a message to the developers list.

    • Only one restart procedure is required, same number of nodes is just one case. Make sure we test this with large molecules such as proteasome and the other new large one (Charlie has this).

    • All of GROMACS' printf's need to be managed, for the short term we can consider piping all of them to files which we process. Interaction with COSM is most of the issue.

    • Code changes:

      • Replace our stress CPU with COSM calls.

    • New printing mechanism is in place now.

    • COSM has a test script that can be used to verify the subset of the API which we are using.

    • Code to check the quality of master.conf file.

    • Signals and COSM. Is it possible to have more than 2? LAM and COSM conflict in their usage. Use diff in COSM directories to see where changes have been made.

    • Code review:

      • Consider use of structures to organize data elements.

      • Why divide by 4 WRT LAM hosts?


    • Documentation

      • Same path name must exist on mother and all child nodes. Does COSM offer a way around this? Why is this?

    Posters and Presentations


    • Student Ambassador program - JoshM is interested, will coordinate with John.

    • SIAM poster submissions due next Friday (the 27th). F@C ready to go, need N-M and CPs education presentation.

    • Ohio LinuxFest, CP will put something together once the Grab-A-Byte presentation materials are ready.

    Plumbing


    • Bazaar Annex without NFS - wierd BIOS setting that requires keypress before boot? LAM is the only software that we should install. Keep notes during the install process (LAM, F@C user, etc.). JoshM will finish this up.

    • Cairo Annex without NFS - c12 through c15. No NFS, local password file, home directories local.

    • Bazaar slowness. Charlie.

    Posted by charliep at 01:56 PM | Comments (120)

    update

    • General
      • Spent a good amount of time reading and learning about routing, dns, dhcp, and the proper ways to admin networks.
      • I am signing up for the SC2004 student volunteer program. They have sections on funding and requests, so I have some minor questions.

    • Plumbing

      • Bazaar connectivity problems fixed.
      • Some annex nodes are failing to boot. The reasons are not consistant. b16 still needs to be fixed.
      • Still no progress on Bazaar slowness.
      • Cairo image was made sucessfully. The actual imaging is complaining about having no boot loader installed when there is definitly one on the golden client.

    • Folding@Clusters

      • Code review ready for our meeting.
      • Restarting is coming along.

    Posted by mccoyjo at 10:24 AM | Comments (82)

    August 16, 2004

    Update - Josh H - Aug 16, 2004

    Worked/Working on
    • General
      • Took photos of Chalk Board in Cluster Mtg room and posted them here.
      • Installed LAM 7.0.6 on Cairo.
        The errors were:
        ../../share/.libs/liblam.so: undefined reference to `_ioexit'
        ../../share/.libs/liblam.so: undefined reference to `_getbuf'
        ../../share/.libs/liblam.so: undefined reference to `_tiob'
        I fixed it by adding to the rest of our build script:
        LDFLAGS="-L/usr/bin -lutil"
        --enable-shared
      • Installed LAM 7.1 Beta 16 on Cairo, noting the above.
    • F@C
      • Started a Developer's Documentation
      • Nanny Stalls: Some of the fist signals sent by the child are lost so a timeout on a loop (denoted in the code by NOTE: JOSHH 1A) will ensure that it always finishes this loop instead of waiting forever.
      • Built and ran on Bazaar. Needs to be tested on non-NFS setup.
        You can use the configure.pl script to switch bettween x86 and ppc setups easily.
      • Nearly finished implementing a Print Command that will let us either print to stdout or print to a Log file (which is what we want to do in production). I have the mother and nanny finished, and the child will be finished soon.
    To Do/Pending
    • F@C:
      • Keep working on to do items.
      • Track memory address with malloc/free issue on mother and child. Is GROMACS playing with the memory that I malloc'ed?
        I put in a work-around in the code that keeps the mother from segfaulting when freeing the arguments passed to mdrun. We should track this down in the near future though.
      • Ensure security of signal handling. --> Instead of signals maybe a HTTP handshake??
      • Test code on Bazaar Annex using non-NFS filesystem.
    Posted by hursejo at 06:13 PM | Comments (73)

    August 11, 2004

    Meeting Notes - Aug. 11

    Plumbing
    • Cairo Image is complete for c1-15, c0 is next in line.
    • Bazaar is having issues with systemimager (silent failure) which is holding the image. Internet routing (see next note) may help this.
    • Outside logins to Bazaar is not getting routed correctly. Is this residual from the cluster move? Check with Kevan...
    • Bazaar slowness: The fix Skylar send did not work as reported by Josh M. Charlie takes this token.
    • Bazaar Annex becomes a non-NFS to use as a testing gound for F@C.
      Make home files to point to /home instead of /cluster/home
    • DDT is still running out of ~mccoyjo, Josh M will move to /cluster/cgi-bin and update html page (Note both of these are repositories in CVS.)
    • DBI::Proxy Deamon on hopper.
    • Josh H is checking out installing LAM 7.0.6 on Cairo in his spare cycles.
    General
    • Do MT Entries...
    • SC2004 Student Volunteer Apps.
    • Need to finish up the presentation for Ohio LinuxFest.
    • Next formal meeting Tuesday 1p.
    F@C
    • Bugzilla: No progress. Charlie will work on this soon.
    • Checkpointing and Restarting: tpbconv -- may allow us to take last state of mdrun and produce input to mdrun without grompp. Generates a new tpr file.
      Could we use this to monitor that it is running correctly? No because of varous loads over the time of the run.
      How are we going to do this accurately?
      Only need to keep last Checkpoint file at any point in time for this method.
      Is there a way to check the validity of a given checkpoint file. Sanity Check... May be just checking for errors in tpbconv?
      2 restart situations:
      • Restart with same number of nodes
      • Restart with different number of nodes
    • Should compare notes for restarting with FAH GROMACS core restarting.
    • Memory tracking: no progress.
    • See if GDB may be of help in tacking stalled nanny problem.
    • Need to build on Bazaar.
    • Code Review. Next week we should meet Tuesday at 1p
    Numerical Methods
    • Dawit is working on completing the Chart of Runs. #1 Priority...
    • Chart listing rules:
      1. Min of 2
      2. 20% or more
      3. Once something appears in list it stays in list
    • Dawit noticed an interesting pattern in the results. Will put this information as a footnote to the bottom of the chart.
    • Read all links that John posted to MT from Aug 5, 2004
    • Josh M may start working with Numerical Methods folk in conjunction with F@C
    • Josh H will take photo of Chalk Board in Cluster Mtg room and post to web.
    B and T - GROMACS Paper -- On the shelf
    Posted by hursejo at 02:50 PM | Comments (133)

    August 10, 2004

    update

    • Plumbing
      • Bazaar Image - Everything seems to be in order save the installed system imager. It fails silently when creating the master imaging script.
      • Cairo Image - A new and lightweight image has been installed. The installation is quite functional. DBI::proxy, systemimager, and c3 tools have been installed.
      • Bazaar Slowness - I would like a fresh pair of eyes to stare at this problem with me. The delay only happens when ssh'ing above b0. There is no delay on DNS queries or pings. I've exhausted my knowledge of how ssh works.
      • Bazaar Lack of Connectivity - Bazaar has lost the ability to see the world outside of hopper. DNS queries succeed. Pings out from bx fail. Pings from quark/acl give: Redirect Host (New nexthop: 159.28.230.232). The problem may be with our last config on quark. The problem is still being investigated by Dawit and I.
    • Folding@Clusters
      • Code Review - I have waded through the code excluding some capability discovery code. Things are looking good. I want to hit some XXX's after doing a quick review of the latest cut that includes COSM calls.
      • Restarting - Things are proceeding well on this front. In the last couple of days new ways of implementing restarting have come to my attention. Instead doing the cludge of converting the output file to an input format and generating a new conf file using trjconv and grompp, I have found good utilities for restarting after a crash and recycling output to input. I'm prepared to discuss these methods in detail at tomorrow's meeting.
      • Checkpointing - When looking at the restarting problem, I stumbled across the mechanism in gromacs to periodically write .trr, .txc, .log, and .edr files to disk on the head node. Combined with the tools used for restarting, I believe we have a simple solution to both problems. Again, more detail at tomorrow's meeting.

    Posted by mccoyjo at 07:03 PM | Comments (39)

    Update - Josh H - Aug 10, 2004

    Worked/Working on
    • F@C
      • Finished a set of examples for COSM:
        Capability Discovery
        File I/O
        HTTP client/server (Currently with 3 Threads)
      • Mother is now shipping binaries via MPI for child/nanny
      • Converted much of mother/nanny/child to COSM
      • Added HTTPD to mother, and HTTP to child/nanny for transfer of results files.
        To make this easier/faster/more secure we may want to think about harnessing the zip feature in COSM to Zip up our results and send them via HTTP...
        Currently the GetWork function requests the tpr and gro files from the mother, the Result function should push the results file to the mother. We should also make a Checkpoint function that pushes a checkpoint to the mother.
        There are still some bugs with the Nanny. every once in a while one or more of the nannies will not move into the Checkpointing stage. This causes the mother to stall when trying to free the children. I need to look into this. I think it is just a MPI programming error somewhere.
      • If you want to play then checkout the cvs tree and build it. It is currently primed for Cairo. Read the README for how to set it up.
      • The master.conf file has changed a bit since we are using COSM's built in Config library.
      • The Head child creates a directory $WORKING_PATH/work in which is moves all of its files and runs mdrun. Mother works out of $WORKING_PATH/molecule
      • So by the looks of things we should be NFS agnostic at the moment.
        Do we have a testing environment to confirm this?
        Could we use Bazaar annex when Josh M and Dawit are finished?
    To Do/Pending
    • F@C:
      • Keep working on to do items.
      • Track memory address with malloc/free issue on mother and child. Is GROMACS playing with the memory that I malloc'ed?
      • Nanny Unexplained stall in pre-checkpoint stage
    Posted by hursejo at 03:15 PM | Comments (165)

    August 05, 2004

    Meeting Notes - Aug. 5

    General
    • Next Meeting: Wed., Aug. 11 @ 2pm
    • Cheers to Charlie for the Shirts.
    • Post your MT updates.
    Plumbing
    • All plumbing is tabled until next week due to Abstract work??
    • Cairo image: DBI::Proxy??
    • Move Bazaar Annex to Cluster Closet:
    F@C
    • Abstract: Submitted to SC2004. Will want to submit to SIAM as Poster by Aug 27th as well. May want to change format.
    • COSM Addition: Josh H Working on test case, will soon integrate into code root.
    • Memory Leaks: Josh H not looked at this yet.
    • Restarting: Josh M needs to report on this...
    • MPI shiping of binaries: Josh H is going to do this soon.
    • Bugzilla: Charlie no progress.
    Numerical Methods
    • Abstract: Due Aug 27.: In development...
      Don't know if they are printing proceedings.
      Focus on collecting data and resources
    • Coordination of Runs:
    • The Chart: Separated for Bazaar and Cario. Wall time is the time represented in the chart.
      Track functions taking 20% or more time, at least 2. Track significant functions through out all configurations.
      John saw a ghost. will report future work items.
      look into unrolling of inner loops, and cache eff.
    • Literature search: John will put them in MT, we will all look into them.
    B and T - GROMACS Paper
    • Tabled...
    Posted by hursejo at 12:35 PM | Comments (54)

    August 04, 2004

    Update

    I have been focusing on the abstract paper. I am currently looking into the different major numerical methods that are implemented in MD packages (Ewald Corrections, Monte Carlo, LJ, Fourier) and trying to understand their theoretical base so it'll make sense when I compare to implementation in gromacs. I am also looking more into the invsqrt routine and doing tests to provide solid answers to the abstract start up questions I have put together (look in cvs).

    Posted by bekelda at 09:37 PM | Comments (99)

    Update - Josh H - Aug 4, 2004

    Worked/Working on
    • F@C
      • Finishing up COSM HTTPD Client/Server Example.
        Which I am using to learn about how we want to use the COSM library. Hope to finish this before the meeting tomarrow, and start integrating it with the F@C core.
    • B-and-T-GROMACS
      • Finished up dumping of text into Extened Results Section. The paper has been shelfed for the next couple of weeks.
    To Do/Pending
    • F@C:
      • Keep working on to do items.
      • Track memory address with malloc/free issue on mother and child. Is GROMACS playing with the memory that I malloc'ed?
      • Have mother ship the mother/nanny/child binaries via lam's mpirun.
      • Add HTTP support in F@C
    • B-and-T-GROMACS:
      • [Tabled] How to handle general citations (rather than specific ones).
    Posted by hursejo at 06:00 PM | Comments (57)

    Schaefer's Wednesday Update

    There is a new chart (more correctly, a pair of charts) in the cvs directory now, next to the old one. The apparent lack of PME runs is misleading, I just haven't gotten Dawit to put his runs in yet. The chart also has a couple of typos, etc, that will get worked out tomorrow.
    I think reading the PoCo book is good background stuff. Meaning I am learning new things which deepen my overall understanding of what is happening, but none of it jumps out and says "read me! I am topical to gromacs!". I think it is useful.
    The shirts are cool.
    I did a little bit of a literature search, and found some potentially interesting articles, where should I park them?

    Posted by schaejo at 04:08 PM

    August 02, 2004

    Meeting Notes - Aug. 2nd

    General

    The meeting for this wednesday is moved to this thursday at 1pm. Charlie will attend by phone.
    Wednesday the eleventh at 2pm is the next meeting, for those still around.(hahaha!)
    JoshH is to bring fancy shirts from the ranch, to share with the rest of the group.
    The reading list was updated a while ago, how is it going?
    Post your MT updates.

    Plumbing

    Most Important Item: the Cairo image. add DBI proxy to the list, and such.
    Charlie will visit Fryes for cable gender converters. (our problem is female to female)
    Move the annex downstairs. Set it up on the smaller cart.

    B and T - GROMACS Paper

    Topic tabled until early september.
    Charlie will deal with poster printing at Kinkos (while driving in a fire truck?)
    Decision on venue to print in is also tabled till september.
    JoshM's audit runs went fine, except for dppc.

    F@C

    This is prioity One for JoshM and JoshH (and Charlie).
    Looks like porting to COSM will be a good idea. JoshH will scoop out the details.
    Memory Leaks were not caught by electric fence, Charlie suggests trying a small test case and setting a watchpoint with gdb.
    JoshM seems to be making progress with Restarting by hand, now he gets to make it automagic.
    Bugzilla - Charlie?

    Numerical Methods

    The abstract is priority One for John and Dawit (and Charlie). Write blind versions of it tonight (monday) so that we can meet about them tomorrow (tuesday) afternoon with Charlie.
    Coordination on the testing runs, so we can go faster. Finish the Chart by wednesday/thursday to finish the abstract by thursday/friday.
    Literature search, look for good information on comparing Newton-Raphson to other methods of division.

    Posted by schaejo at 04:34 PM

    Update - Josh H - Aug 1, 2004

    Worked/Working on
    • F@C/Numerical Methods paper
      • Review/add to Abstracts
    • Numerical Methods
      • Cleaned up formatting of Chart.
    • B-and-T-GROMACS
      • Sent mail to ACM Crossroads regarding publishing our article.
    • General
      • WeatherDuck is now running in an infinite loop polling data. This has seemed to help with the sound metric.
      • Added dawit to software group so he can fully use cvs.
      • I have installed DBI::Proxy on many of the cairo machines. Some of them are failing nonuniformally. I am leary about changing the scheduler to use Proxy instread of Pg until the cluster is uniform. Could we push an image on cairo that has it installed and running? Is the Cairo Image ready (less this addition)?
    To Do/Pending
    • General
      • DBI::Proxy needs to be installed on Cairo for the scheduler. Bazaar may already have this.
    • Folding@Cluster:
      • Keep working on to do items.
      • Track memory address with malloc/free issue on mother and child. Is GROMACS playing with the memory that I malloc'ed?
      • Have mother ship the mother/nanny/child binaries via lam's mpirun.
      • HTTP client/server: Look into COSM.
    • B-and-T-GROMACS:
      • Talk about printing Poster...
      • Add 'Extended Results' Text.
      • How to handle general citations (rather than specific ones).
    Posted by hursejo at 09:55 AM | Comments (99)

    Update

    I read the NASA article.
    I've re-run the bazaar tests for sse after F@c was killed on the node (difference only on wall time).
    I did a test on enable-software-sqrt and did not get any noticeable chanege in performance. I will investigate this further as this seems like a crucial function specifically implemented on software to improve performace.
    Dawit

    Posted by bekelda at 09:35 AM | Comments (118)

    July 28, 2004

    Meeting Notes - July 28, 2004

    Plumbing
    • Bazaar: Josh M and Dawit are still making progress. Able to ssh directly to b0. Cannot access b0 from hopper. Maybe chat with Rowan about this.
    • Bazaar slowness: Unable to work on this until above issue covered. Josh M
    • Authoritative list of 0th nodes tested with athena and in CVS: ibid. waiting for athena to come up before pushing forward to this. Josh M and Dawit
    • Bugzilla: No movement yet. Charliep
    • Check WeatherDuck and post comment. Try leaving it on. Josh H
    • DBI::Proxy needs to be installed on Cairo for the scheduler. Bazaar may already have this. Josh H
    Folding@Clusters
    • Check out 'Electric Fence' to track memory leaks between F@C and GROMACS. Josh H
    • Josh M and Charlie have started code review. Will make MT entries.
    • Charlie is tracking Seg Fault problem with Capability Discovery in F@C. He is checking into Optimization levels via gcc to fix this.
    • Restarting: trjconf may be the program that we need for taking output from mdrun to grompp. Is this the only file we need? How do we use it?
    • HTTP client/server: Josh H is researching how to do this. Look into COSM.
    Numerical Methods
    • Wall Time & Gprof Chart: Add borders and fix formatting issues. Josh H
    • Results are nearly complete. John and Dawit
    • Dawit is having some mpicc compiling errors. Josh H will take a look.
    • Josh H found some text regarding Long range interactions in the mdp file for GROMACS. Passed info to John.
    Papers and Presentations
    • Some additions to the Reading list. Check it out!
    General
    • Dawit and John need key CAB13. Fill out forms and leave them on Charlie's Desk.
    • Need combo for keypad access to Basement access to Cluster Closet. Charlie
    • Need keycore changed for Cluster Closet from CAD3 to something reasonible. Charlie
    • Where is the space for the Cluster Computing group during the Academic year? North end of Recompute in Basement.
    B-and-T-Gromacs
    • Send mail to CrossRoads regarding publishing our article in the next Quarter or two. Mention status of authors.
    • Josh M is having some problems running DPPC on cairo with the scheduler. Going to see if a reboot on selected nodes will help (This may be a memory leak problem).
    • Audit runs - Search for 'Audit' in the label in the DB. Josh M will reconcile the Audits with the previous runs.
    • Josh H will add 'Extended Results' Text.
    • Not going to pursue PDF problems on Hopper.
    Posted by hursejo at 01:13 PM | Comments (99)

    John's Wednesday Update

    I have been working on the gathering of the data for the different compilation flags. The data generation step is almost over, and I will be switching to analyzing it more closely today.
    I have made no real progress on the writing of the abstract, because, while I now know fairly clearly what the work of the rest of the project will be, I do not know what we are going to find. And since I think that abstracts are a summary of the findings, I don't feel confident enough in my guesses of what we are going to find to write an abstract.

    Posted by schaejo at 10:27 AM

    update

    • Spent a lot of time working on bazaar. This has been a lengthy affair between the dhcp and the dns problems.
    • Bazaar slowness is on hold until bazaar is usable.
    • The audit runs have been chugging away. There are still some failures in the scheduler for some of the runs.
    • Started the F@C code review.
    • Read the abstracts and the b-and-t-g paper again.
    Posted by mccoyjo at 09:44 AM | Comments (39)

    July 27, 2004

    Update - Josh H - July 28, 2004

    Worked/Working on
    • F@C/Numerical Methods paper
      • Review/add to Abstracts
    • B-and-T-GROMACS
      • Other Considerations merged into Results and renamed to Extended Results
      • Made Readme for detailed-scheduler.pl
      • PDF rendered very unusually on PowerBook, PS ok.:
        It seems that the problem lies in the way latex is parsing the geometry.sty file on Linux machines and hopper. I tried building the tex file on my RedHat server and it produced the same result. However when I built the tex file via the teTex tools on OSX it displayed fine on an OS X PowerBook and RedHat Server. I am not sure how to proceed.
      • Poster Links
        SIAM
        ACM Crossroads
      • Crossroads of the ACM:
        • Writers Guide provides some useful details.
        • This is a quarterly magazine
        • Submit in HTML format
        • 1500 and 6000 words
        • Only call for articles at the moment is for SPAM, Here is the Call for Articles site to monitor.
        • Link to previous magizines.
        • I could not find anything regarding who should/[should not] submit articles. So it is probably fine for Charlie to be an author.
        • The site is undergoing some work, and many of the links are broken or outdated. It is a bit hard to navigate and fine useful information. Maybe we should just e-mail them with questions?
        It could be a while before we get a chance to submit anything here. Should we consider other places?
    • General
      • perl/postgres/DBI/DBD install inconsistant on Cairo. It seems that c0,3,4,5 have postgres installed along with DBD::Pg. DBD::Pg is a dependancy of detailed-scheduler.pl. 95 % of the time we run it off of c0 and use MPI, if we run with only 1 process then we use ssh instead of MPI to run the job to save the overhead (if any) of the MPI calls. Due to this the scheduler will die if you run on any other node than those 4. To install DBD::Pg you need a local postgres install. Also on these 4 nodes perl 5.8.3 is installed where the rest of the cluster has 5.8.0 installed.
        Is there a way to NOT have a local install since we always use the one off of hopper?
        This should be something for our next image once we figure out all the details.
    To Do/Pending
    • Folding@Cluster:
      • Keep working on to do items.
      • Track memory address with malloc/free issue on mother and child. Is GROMACS playing with the memory that I malloc'ed?
      • Have mother ship the mother/nanny/child binaries via lam's mpirun.
      • Buildin HTTP Client/Server
    • B-and-T-GROMACS:
      • Talk about printing Poster...
      • How to handle general citations (rather than specific ones).
    Posted by hursejo at 09:29 PM | Comments (60)

    July 26, 2004

    Meeting Notes - July 26, 2004

    Plumbing
    • Bringing Bazaar and Athena back on-line: Can see b0 from hopper. Stumbling with dhcrelay, currently Quark is collecting all of the packets. Suggestion: Make Quark a temp dhcp host for bazaar and athena. Josh M and Dawit will purse this solution.
    • Bazaar slowness: Unable to work on this until above issue covered.
    • Authoritative list of 0th nodes tested with athena and in CVS: ibid. waiting for athena to come up before pushing forward to this.
    • Bugzilla: No word yet from charlie. No rush at the moment.
    • Fixed ppc_altivec.h in the 'mygromacs' directory that people have been using. John and Dawit confirmed the fix.
    Folding@Clusters
    • Restarting: Been playing with the files a bit, but nothing definate yet.
    • HTTP client/server: Josh H is researching how to do this.
    Numerical Methods
    • How many nodes do Dawit and John need for the tests we discussed on Sunday? Charlie wants the rest.
      • John: c15, c14
      • Dawit: c12, c13
      • Charlie: c0-c11
      • Josh M needs a few nodes for running the audits.
    • Wall time, hotspot chart to HTML: Needs a bit more work [Wall times, links to gprof stuff]
    • list of long range interaction routines (PME, CUT-OFF, ??,??): Nothing yet, going to look through the GROMACS manual next.
    • Working on expanding the chart for different configure flags. Doing installs at the moment while walking through the GROMACS source.
    • Altivec flags now working and seeing a big performace gain.
    Papers and Presentations
    • Review, improve, etc. the F@C abstract. Don't delete text rather move it to a deprecated section.
    • Review, improve, etc. the N-M abstract. Don't delete text rather move it to a deprecated section.
    • Josh H will find link to SIAM poster instruction site.
    General
    • Adjenda for July 28. 1p = General, 2p = 2 Abstracts, 3p = B-and-T-GROMACS -- Charlie to meet via phone.
    • Need to calibrate the weatherduck, but need more datapoints, and another source of information.
    • Need keycore changed for Cluster Closet from CAD3 to something reasonible.
    • Need combo for keypad access to Basement access to Cluster Closet.
    • Dawit and John need key CAB13.
    • Where is the space for the Cluster Computing group during the Academic year?
    B-and-T-Gromacs
    • Molecule runs: On hold until he has acces to some cairo nodes. having some problems with the scheduler, Josh H will help debug.
    • Josh H will put instructions for using the detailed-scheuler.pl script in a README file in CVS' b-and-t-gromacs repository.
    • Josh H is tracking down problem with PDF rendering.
    • Josh H is going to move Other Considerations section into Results section and rename it to Extension??
    Posted by charliep at 01:30 PM | Comments (23)

    sunday update

    A week full of meetings and the cluster move prevented me from doing much "work". However, what to do next is quite a bit clearer.
    To Do List: Compile versions of GROMACS with various configure flags, run them with gprof to determine behavioral differences and without to determine run times, isolate the inner loop "kernel", read new items, and understand the algorithm of the inner loops.

    Posted by schaejo at 10:56 AM

    Update - Josh H - July 26, 2004

    Worked/Working on
    • B-and-T-GROMACS
      • Posted information about how to use the current scheduler. This informaiton should probably make it to a README in the near future.
      • How to handle definitions? Journal, Zobel?
        Zobel say: When the new term is first introduced, define it in the paper. No footnotes or seperate sections.
    • Numerical Methods: Fixed my Gromacs tar file to have the fixed version of the configure script and the ppc_altivec.h.
    To Do/Pending
    • Folding@Cluster:
      • Keep working on to do items.
      • Track memory address with malloc/free issue on mother and child. Is GROMACS playing with the memory that I malloc'ed?
      • Have mother ship the mother/nanny/child binaries via lam's mpirun.
      • Buildin HTTP Client/Server
    • B-and-T-GROMACS:
      • Talk about printing Poster...
      • Crossroads of the ACM? Guidelines, deadlines, etc. Charlie ok as an author?. Submission timetable and guidelines
      • Make Readme for detailed-scheduler.pl
      • PDF rendered very unusually on PowerBook, PS ok.
      • Other Considerations merged into Results with appropriate changes.
      • How to handle general citations (rather than specific ones). I don't even know if there is a commonly accepted way of doing this in scientific literature. It may be that re-reading the articles, after our prose is relatively stable, looking for specific citations is the way to handle this.
    • F@C/Numerical Methods paper
      • Get link from SIAM regarding Poster Setup, post it in MT until we find a stable place to put it.
      • Review/add to Abstracts
    Posted by hursejo at 07:10 AM | Comments (128)

    July 23, 2004

    Meeting Notes - July 23, 2004

    People Present: Josh H, Josh M, Dawit, John, Charlie

    Generic


    • Cluster Move: We are not going to move Bazaar into the room until the Duck is calabrated. Need to watch the room for a while still.
    • Reading: Lindal ClusterWorld Article
    • Adjenda for July 28. 1p = General, 2p = 2 Abstracts, 3p = B-and-T-GROMACS

    Plumbing


    • Get Bazaar and Athena wired up and working - Josh M and Dawit
    • Bazaar Slowness: No Progress - Josh M
    • Make Athena a cluster once again. Dawit.
    • Zero'ith list still in development.
    • Charlie has BugZilla token.
    • Josh H will fix gromacs tar ball in mygromacs directory so the ppc_altivec.h add the altivic.h include.

    Paper


    • F@C to SC2004, Both F@C and Numerical Methods to SIAM.
    • Get link from SIAM regarding Poster Setup, post it in MT until we find a stable place to put it.
    • Numerical Methods folks will meet Sat July 24 at 3 pm

    Numerical Methods


    • Are the Wall time values in the chart based on the non-profiled AND non-debug binaries? They are no using '-gdb' so no debugging. No to profiling.
    • The table from the White Board is comming along WRT to completion and posting of HTML version.
    • Reading: GCC 3.3 Manual has some good info about Alitivec and SSE.
    • cflow might be handy. Talk to Josh M.
    • Need list of long range interaction routines (PME, CUT-OFF, ??,??)

    Folding@Cluster


    • Josh M reporting on Restarting: Has some good leads, but no solution as of yet.
    • Code review by Charlie and Josh M.
    • Getting files to children and from nannies. HTTP (which will be in the mother) or code up some TFTP. TFTP is out because of encryption so use SSL via HTTP on a designated port -- seperate listening process for mother. Should present configuration html, push/pull files in house, and communicate with Pande Labs. Apache model is reasonible.
    • Josh H will send Josh M a sample conf file and talk about create-configure.pl script.

    Posted by hursejo at 04:01 PM | Comments (116)

    July 22, 2004

    Update - dawit

    Run lzm/pme on cairo with altivec,compiled fine.
    Took an image off of b20 for bazaar image.

    Posted by bekelda at 09:34 AM | Comments (13)

    July 21, 2004

    Update - Josh H - July 21, 2004

    Worked/Working on
    • B-and-T-GROMACS
      • Talked about paper with Josh and Charlie.
      • Will work on this more Thursday Morning.
    • General:
      • Passed Bugzilla token to Charlie.
      • Cleaned up the install scripts on cairo and bazaar in $CLUSTER/bin/
    • Numerical Methods: Fixed my Gromacs tar file to have the fixed version of the configure script.
    To Do/Pending
    • Folding@Cluster:
      • Keep working on to do items.
      • Track memory address with malloc/free issue on mother and child. Is GROMACS playing with the memory that I malloc'ed?
      • Have mother ship the mother/nanny/child binaries via lam's mpirun.
    • B-and-T-GROMACS:
      • Talk about printing Poster...
      • Check CrossRoads Submission timetable and guidelines
    Posted by hursejo at 07:36 PM | Comments (30)

    July 20, 2004

    Meeting Notes - July 19, 2004

    Plumbing


    • Always check optimization levels and profiling options in _all_ our build scripts. You only want to use debugging or profiling when you specifically need it, they will prevert the runtime statistics if enabled for timed runs. Later on we need to figure-out what optimization levels PPC and x86 will safely support. PPC has trouble with 03 when doing proteasome over 4 nodes. JoshH will check/fix /cluster//bin to see if they are right.
    • Bazaar slowness - If ssh is the problem why is Cairo so fast with ssh? For JoshM's plumbing item he will look at this soon. Did DNS change propogate?
    • State of Athena image? 12 nodes hopefully, Dawit will check them. Getting closer...
    • Node 0 list? List from Hassan + list from Charlie's blog entry = new document in SNA CVS project. Dawit and JoshM.
    • New scheduler. Flags to F@C? Sounds possible. J^2 and CP meeting sometime later this week.


    Folding@Clusters

    • JoshM and Charlie should look at the TODO in CVS.
    • NFS dependency needs to be broken, problems with stress CPU code and optimization, configure script (good HowTo available), freeing memory causes a seg fault. Charlie will add these to the TODO.
    • Code review this week, JoshM and Charlie.
    • Starting processes on particular nodes with LAM-MPI, we need to figure this out so that we can use it for load balancing. Charlie.
    • Checkpointing - Skip LAM-MPI for now, JoshM will look at GROMACS to see what the relationship is between what it currently writes-out and what grompp takes as input. Look at all those tools that come with GROMACS, does one of them do this? JoshM.
    • Bugzilla setup on admin. JoshH.


    Numerical Methods

    • Problems with AltiVec on Cairo, fixed with altivec.h in configure. Was this submitted as a bug?
    • Dawit and John will finish the table and the call diagram. Wall time, config.log, gprof pointers, etc. See last meetings' notes. Legend on call diagram. Check J^2's diagrams from last year (white board pictures, cflow, etc.) Where are the calls to FFTW?
    • Find full list of long-range interactions (PME, cutoff, others) and add those to the chart. John and Dawit.
    • Reading list - Charlie will work on this during the week and update it.

    B-and-T GROMACS


    • JoshH's trim and substitute suggestion. Maybe just report preliminary results? Depends on the publishing venue, talk to Jim and see what he thinks. Looks like we should carefully state what we learned based on tests we actually did and leave the rest for future work. We are heading towards Crossroads (ACM) as the venue.
    • JoshH added a new sub-sub-section on choosing a benchmark, we should all review this.
    • J^2 and CP will meet Wednesday at 12p - decide on venue, review current draft.

    Papers and Presentations


    • Calendar tour - July 26th deadline for SC2004, August 11 deadline for SIAM. Both will require an abstract. Sounds like Numerical Methods and F@C for each.
      Numerical Methods for Molecular Dynamics on Commodity Vector Arch.
      Folding@Clusters: Using the Parallel Grid Resources for Large Molecule Molecular Dynamics.
      All of us should be thinking about those this week, they will be the focus of our meeting on Thursday.
    • Changes to make for LinuxFest. Some of Charlie's Grab-a-Byte sounds like it may be appropriate for this, he will put a copy of it with the original submission when it's ready (in a week or so). 1 hour with questions.

    General


    • Move the clusters this Thursday at 11a. Network connection? Computer and WeatherDuck in the new space.
    • Forward scheduling - We will meet on Mon and Wed at 1p during the last week of July and first week of August.

    For next meeting items see the unpublished entry for that date.

    Posted by charliep at 07:33 AM | Comments (28)

    July 19, 2004

    Update

    This weekend has been spent primarily searching for information about the parallel I/O in the MPI2 standard.

    First of all, file I/O as specified by MPI2 allows processes using MPI both basic file I/O and parallel file I/O regardless of the underlying system. There are some other MPI impementation that would be interesting to check out. Here are some free, source available, comercial implementations and some non-comercial implementations.


    The implementations of MPI I checked for MPI2 file I/O support are: MPICH, LAM-MPI and MPI-LITE. After a bit of digging, I found that MPI-LITE has little to no support for the MPI2 I/O standard and both LAM-MPI and MPICH rely on the same software implementation of the standard: ROMIO. Yeah, remember ROMIO? It is a software package designed to fit into any MPI implementation to provide the I/O. When dealing with file I/O, ROMIO is dependent on another software package, ADIO, to provide an abstract interface to many different underlying filesystems.

    For a general overview and some performance analysis, check out this paper: A Case for sing MPI's Derived Datatypes to Improve I/O Performance

    That's great, but how does this apply to checkpointing? The collective parallel I/O is shared in a single, logical file (aka: it looks like the same file to all the processes sharing the I/O file). The big question is where is this information stored on disk in a way that we can use it for checkpointing? I have yet to find a source that gives implementation details. In order to find the answer to this question, more research and/or looking at ROMIO/ADIO's code is needed.

    links:
    MPI-LITE
    LAM-MPI
    MPICH
    MPI2 Standard
    ROMIO
    ADIO
    LAM-MPI User's Guide

    Posted by mccoyjo at 09:47 AM | Comments (52)

    Sunday Update

    Well, I didn't get as much done over the weekend as I had hoped, which means the table has yet to be made. I am still getting results for cairo that the ppc-altivec flag has no effect, which means that I need to pay yet more attention to the configure file and compile output (I guess I had a false success on the last build). Dawit and I will create the table this morning.

    Posted by schaejo at 08:36 AM

    Update - dawit

    Did new runs both on bazaar and cairo with time command.
    Spent time with athena plumbing, not much progress but I've switched a11 and a0 and I'm configuring gentoo for a0.

    Posted by bekelda at 07:54 AM | Comments (62)

    July 18, 2004

    Update - Josh H - July 18, 2004

    Worked on/Working on


    • Folding@Cluster:

      • Working on debuging the code.

    • B-and-T-GROMACS

      • Worked on cleaning up the paper. I have been through most of it, and am keeping an updated ps and pdf in CVS for all to look through. I suggest that before commiting the text document to CVS that we always make sure that the postscript and pdf versions are up to date. I have a script to do this if any are interested. src/perl/LatexMake.pl
      • I am considering droping the GROMACS Ports for the 'Other Considerations' section, and replaing it with a short discussion on the possibility for this type of exploration. This way we can finish the paper, and put the porting stuff in Future work. Also this allows me to focus more on the F@C stuff. Thoughts?

    • General: Figured out why Charlie was having the CVS problem that he would not receive any new directories when doing a 'cvs update'. By default 'cvs update' does not check for new directories and only checks those files in the directories that you have. a 'cvs update -d' will get any new directories.

    To Do/Pending


    • B-and-T-GROMACS:

      • MPICH Port: Find out why it stalls on MPI_Finalize with SMP runs
      • MP-Lite Port: Run gdb on it to find where it segfaults
      • Possibly use another MPI package?

    Posted by hursejo at 03:10 PM | Comments (29)

    July 15, 2004

    Meeting Notes - July 15, 2004

    Numerical Methods


    • Make sure that the right options are being used for each GROMACS build, reconcile the results. We should have documented command lines for GROMACS' ./configure for each platform and test configuration that we are running.
      Josh H pulled the following information from the b-and-t-g dtabase with the following SQL command:

      SELECT * from option_profile where layer = 'Gromacs' and code_root ~* 'bazaar' and options ~* 'Optimal';

      Bazaar
      --enable-mpi --enable-mpi-environment --enable-float --disable-software-recip --enable-software-sqrt --enable-x86-asm --disable-ppc-altivec --disable-cpu-optimization
      Cairo
      --enable-mpi --enable-mpi-environment --enable-float --disable-software-recip --enable-software-sqrt --disable-x86-asm --enable-ppc-altivec --disable-cpu-optimization

    • Develop a simple chart (in html) like the one I drew on the whiteboard last week. If there is any "fine-print" add that as text to the page. For each cell in the matrix provide the wall time and a pointer to the gprof output files (one flat, one hierarchical) that correspond to that run. My memory is that between Dawit and John we should have two molecule/methods on two clusters both with and without architecture specific optimizations. Post an entry to MT with the URL
      and send email notification of the post.
      Will also put this chart in CVS. Will post both Total wall time and average wall time over the 10 runs. Will start using the UNIX time command to measure time.
    • Need more to read if there is more available.
    • Continue refining John's blackboard diagram. Leave it up until at
      least Monday. It is in a stable state at the moment, and will likely not change before monday. May consider puttin ghtis in XFig, dia, or other.
    • Dawit noted that invsqrt was called many times, but still ranks low on the CPU Utilization time. Why might this be?
    • Generally we are trying to confirm our impression about where time is spent (both in "generic" mode and architecture specific mode) so that we can identify candidate code for the benchmarking kernel.

    Folding@Clusters


    • BugZilla will be needed before letting anyone outside of our group tests.
    • Code is looking good. Charlie will do a review of it and post the results later this week.
    • What's up with the LAM-MPI checkpointing add-in? Unless it is very simple and powerful it's likely to be easier to manipulate GROMACS' checkpoint capability to meet our needs (IMHO).
      Josh H seems to remember that this is only if lamd dies on the machine moving the process [from the last checkpiont] to another machine.
    • Parallel File I/O is said to be supported by our flavor of MPI. Josh M is still looking in to this.
    • Need to think about CVS tags when a stable verions arrises.

    B-and-T GROMACS


    • Exactly which version(s) of MPICH are we using or have we used?
      mpich-1.2.5.2 is the one we are using and mpich2-0.96p2 which we droped.
    • Josh H still needs to review that paper.

    Plumbing


    • Why is it that when JoshH created the src directory in the folding-at-clusters module and I ran "cvs update" on my client that I don't get the new directory and its contents? (Unless I do a "cvs release" and "cvs checkout".)
      Josh H iwill look into this.
    • What's up with bazaar WRT network lag time?
      Josh M is still working on this. he is fairly sure that it is still something with SSH. He is reading up on this at the moment.
    • What's up with the canonical zero node list?
      Dawit and Josh M are going to image some bazaar annex nodes, and start woriking on building a small cluster with a head node. dhcrelay, systemimager, and something else. They have a starter of the list, but are workingo n confirming this. Will post the list(s) to sna CVS.
    • Josh M and Dawit have been updating the sna plumbing list.
    • What else needs to be done to athena to complete the imaging project and have a useful cluster for F@C testing?
      Systemimager is the only stumbling block at the moment. Dawit thinks that him and Josh M are close to figuring this out for a11. They have been able to pull an image from bazaar.

    General


    • WeatherDuck has yet to arrive. Josh H will send a chaser e-mail inquiring about the replacement for Cluster's WeatherDuck.
    • LinuxFest has accepted us. We should get more information soon.
    • Need IPMI research for serial/IP BIOS access.
    • Did we get the poster? Do we need to get to Dayton to do this?



    Below are some additional items that were talked about in and out of the meeting WRT Monday's Meeting.

    Numerical Methods


    • Josh M gave John and Dawit a small scheduler that they have been using to run tests.

    General


    • Still need to do some forward Scheduling on Monday

    Plumbing


    • User Accounts on Bazaar annex seem to work fine with the new image, so the touble that was previously recorded is being tabled until it comes up again.
    • Josh M has not had a chance to clean the image on Cairo yet.
    • No word on Firewall status other than it has been noted by the imaging folks that no firwall should be on the compute nodes only on hoppers extenal interface.

    Posted by hursejo at 12:32 PM | Comments (92)

    July 14, 2004

    Update - Dawit

    I have collected gprof results from bazaar and cairo, with vector and x86-asm disabled on bazaar and altivec disabled on cairo and without.
    Have fixed systemimager problem on athena and bazaar and I'm taking image from athena.

    Posted by bekelda at 11:35 PM | Comments (43)

    Update - Josh H - July 14, 2004

    Worked on/Working on


    • Folding@Cluster:

      • Put code in CVS: folding-at-clusters/src/
      • Added the Capability Discovery code to the Framework.
        Note that we are getting the segfault on the cpu tests. We have seen this before, and zero'ed out a field to fix it. Now it is back and we may have to do the same thing. We need to run the debugger on the code and find where it dies.
      • Mother is calling our version of grompp
      • Child is calling out version of mdrun
      • Seems to run in limited circles. I have tested it wirh Villin. There are some serious issues that I am working through at the moment. I will send mail conserning these shortly.
      • There are a few ToDo items listed in the TODO Document I am sure there are more. This should get us moving towards our goal.

    To Do/Pending


    • B-and-T-GROMACS:

      • MPICH Port: Find out why it stalls on MPI_Finalize with SMP runs
      • MP-Lite Port: Run gdb on it to find where it segfaults
      • Possibly use another MPI package?
      • Work on Paper!!

    Posted by hursejo at 07:41 PM | Comments (45)

    Sunday Update July 14th

    I have completed gprof runs of the following: bazaar with default flags, bazaar with --disable-vector, bazaar with --disable-x86-asm, cairo with default, and cairo with --disable-ppc-altivec. I suspect something fishy is going on, because both cairo batchs look far to similar, and the bazaar default and the --disable-vector batchs have a similar aspect. This is worries me, and I will have to spend some time looking at the configure script to see what is going on.
    I have made a picture of a blackboard using the gprof call graph as data. It doesn't quite capture the flow of the program, but it at least shows who is calling who, most of the time. I will leave it up for a while (till we consense on its usefullness).
    Otherwise, I have been monkeying around with compiling different copies of gromacs in different directories. And learning very basic perl script (although the learning there took 15min.)
    I am wondering what I should do next.

    Posted by schaejo at 04:32 PM

    July 12, 2004

    Meeting Notes - July 12, 2004

    Present: Josh H, Josh M, John, Dawit

    Numerical Methods


    • gprof runs comming along well. John is working on profiling the gmond files, all of his runs for LZM/CUT are finished. Dawit has finished his runs for LZM/CUT and is working through some gprof errors. John and Dawit will work together to overcome this error.
    • There may be a problem disabiling SSE instructions on Bazaar in the configure script. There isn't one explicitly labeled. Try the x86-asm, --enable-vectorized-recip, --enable-vectorized-sqrt, --enable-vector flag. The latter may not benifit us, but it is doubtfull that it will hurt us.
    • Scheduler may be useful when running these tests in bulk. Josh M has a predicessor to the scheduler in CVS that will be useful for this.

    F@C


    • Josh M reported about load balancing and checkpointing. Load Balancing is all left to the programmer, i.e. not in standard. Josh M has some pointers to papers on this.
    • Checkpointing is not mentioned in any of his reading on MPI. LAM-MPI has an addon that he will look into.
    • Parallel File I/O may be useful for Checkpointing files. Josh M will do some investigations.
    • LAM-MPI has a fairly complete list of Impletations other than theirs, both commercial and Open. This may be a b-and-t-gromas item as well. May be able to support other implementations in F@C core.
    • I put the F@C framework in CVS. It does not have the GROMAS or all of the capability discovery yet. Should be ready the end of the week.

    General


    • Charlie should post Agenda items in MT for next meeting.
    • Need to do some Forward Scheduling early next week.

    Plumbing


    • Image: Athena DNS, NIS, ssh, and all other resources should be working. a11 is working for client image testing, not head node. SystemImager is only distributed [as of late] deb and rpm files. Dawit is going to either download an other version from the sourceforge site and use it, or unpack the rpm version on a redhat machine and copy the source over. If we get this to work we should think about contributing it to the Gentoo site. Dawit is also looking for a dhcrelay port to Gentoo.
    • Dawit will show Josh M how to use systemimager to image bazaar annex. Need to make a golden client, etc.
    • it seems that users accounts are not consistant across the bazaar annex. b16 and b20 are correct, but the others are not.
    • Josh M and Dawit will make a list of additional packages for client and head node from the base install.
    • Clean up the Cairo image, use c15.
    • Make sure firewall is off on all nodes. Should only hoppers external interface.
    • 'The Slowness on bazaar'. Josh M does not link it is DNS. telnet has no delay, but ssh does. This may be a ssh problem. Josh M will check this out.

    B-and-T-GROMACS


    • Need to look at the paper. Josh H take the next look and do some updates.

    Posted by hursejo at 02:02 PM | Comments (109)

    Update

    I'm back from vacation and ready to get back to work.

    The majority of I accomplished while gone was reading. All the MPI reading was completed (the two MPI books and the MPI articles). This reading taught me two important things about MPI. First is that MPI leaves all load balancing to the programmer; there is no real internal load balancing. Second is there is no explicit checkpointing mechanism. That being said, the parallel file I/O specified in MPI2 could be an excellent method through which to implement checkpointing.

    Posted by mccoyjo at 10:15 AM | Comments (21)

    Update - Dawit

    I did 10 runs on b18 and c13 for lzm/pme. I now have enough gprof output to do comparison.
    Made the mistake of using the USE utility to add nis support on athena image and it took 3 days compiling since it rebuilds the whole tree. Need to look into their cross compile option.
    a11 is ready with all network capability so we can ssh and test image. Systemimager download on a11 needs a source tarball and I could not find any current ones without rpm.

    Posted by bekelda at 12:05 AM | Comments (63)

    July 11, 2004

    Sunday Update

    I have finished the first half of the gromacs gprof runs. I have ten for cairo and eleven for bazaar (I can't count). I have re-compiled with disabling flags, and am beginning to do gather data on those. I think I really ought to learn to use cron, or write perl scripts to automate this stuff. It was a good weekend.

    Posted by schaejo at 10:52 PM

    July 08, 2004

    Meeting Notes - July 8, 2004

    Present: Charlie, John, Dawit

    Numerical Methods


    • Readings all done, at least at a high level. When we have a better sense of what code we are looking at we can do the in-line, un-rolling analysis.

    • Building GROMACS on bazaar and cairo seems to be working for both Dawit and John. Need to confirm that -pg is actually there.

    • Wait to see if gprof gives us enough for call graphs before crafting our own.

    • Use gprof to generate flat and hierarchical data based on 10 runs of one molecule/method on bazaar (with and without SSE) and cairo (with and without AltiVEC). John - LZM/cutoff, Dawit - LZM/pme.

    • On Monday talk about the gprof results and consider benchmarking kernel.

    • Sort-out the performance numbers at the end of the GROMACS log. Speak with Josh^2 about our earlier notes about this. Why is PS/Node hour the same for cairo and bazaar with the same molecule/method?

    • Stop the literature search for now.

    • John will document what he has learned about the differntial equations are doing in GROMACS.

    F@C


    • JoshH and Charlie met and discussed the code and the overall approach. Josh will put the code in CVS at some point (soon!).

    General


    • JoshH will "organize" and take notes for the two meetings next week.

    Plumbing


    • Athena - network is ok now, image is coming along but not done yet.

    • List for 0th nodes, see earlier meeting notes.

    Posted by charliep at 12:19 PM | Comments (64)

    Wednesday Update

    I thik I finally found a good way (easy & accurate) to compare the run times between bazaar and cairo. I was wrong about there not being any timing in the .log files, but what I have seen so far I don't quite understand or trust (the log files claim that bazaar is sometimes faster than cairo, which does not match my experience).
    I read all three articles. And the first half of the FFTW doc.
    The literature search is going poorly, I have never been good at finding stuff, but I will continue trying.
    I don't think I understand how to diagram the gromacs code. Is this a graphic representation of gprof's call graph? Or is it something else?
    If it is a visually structured call graph, it would be good to know where the -pg flag goes. Dawitt and I were wrestling with it yesterday.

    Posted by schaejo at 08:44 AM

    July 07, 2004

    Update - Dawit

    Read the assigned articles.
    Built personal gromacs on cairo.Still having problem building it in bazaar.
    Worked on athena image.

    Posted by bekelda at 11:03 PM | Comments (44)

    July 04, 2004

    Update - Josh H - July 4, 2004

    Worked On/Working On
    • Weather Duck: They are sending a replacement. Once it arrives then we will swap out ours.
    • GROMACS Port PVM: All levels finished.
    • GROMACS Port MPICH: After testing on x86 and seeing the same error -- stalling and timeing out on MPI_Finalize, I removed the command from GROMACS source and it is now running. It leaves the mdrun slaves running so I have to manualy kill them before each test. I am re-running the NxNxN tests again since the environment/code has changed.
    • GROMACS Port MPICH2: Droped for the time. It has build problems under ppc.
    • GROMACS Port MP_Lite: Need to run GDB to see what is causing the core dump
    • F@C development Starting to merge GROMACS mdrun with framework. Keeping notes on any changes and what needed to be extracted.
    • B-and-T-GROMACS Paper Installed Latex, dvipdf, dvips, aspell, ispell on hopper to aid in LaTeX development.
    To do
    1. B-and-T-GROMACS Paper
    Posted by hursejo at 04:22 PM | Comments (197)

    Meeting Notes - July 5, 2004

    Numerical Methods


    • Three articles from Dr Dobbs to read, first two more generally applicable, third one more specifically.

    • Build and run gromacs in home dir? Not yet, will work on this soon.

    • Are MFLOPS useful as counters? No, particular values are manipulated in more than one way and at multipule levels. Consider other approaches for measuring load, gprof?, hand scaffolding? After some discussion we decided to go the gprof route. Test both native short vector instruction and generic GROMACS on both x86 and PPC. John and Dawit.

    • Elapsed time comparison between cairo and bazaar, not yet. John.

    • Learn about nohup and &. Dawit and John.

    • Diagram of call structure of GROMACS source module/function dependencies on white-board. Will make electronic later. Dawit and John

    • Literature search for vector algorithms, nothing yet. John

    Plumbing


    • Develop and practice 0 node install on a0. We need a canonical published list of changes made to 0th nodes after imaging:

      • routing
      • dhrelay
      • ssh keys (which need to be preserved in /cluster//etc?
      • others?

    • Dawit returned b15 to its former glory as a cluster node.

    • iptables and ipchains should both be completely removed from all images. Hopper's external NIC is the only place we should have any firewall.

    • Merge SNA and plumbing list. Charlie

    • We need to do a backup audit soon.

    Other


    • Make MT log entries on Sunday evening and Wednesday evening!

    Posted by charliep at 03:26 PM | Comments (47)

    July 01, 2004

    Meeting Notes - July 1, 2004

    Present: Charlie, JoshH, John, Dawit

    General


    • Dawit and John work together more. Move one more workstation into seminar room. (I was unable to reproduce the little hand motion from lunch in this format.)
    • We'll meet from 11a-12p EST on Monday and Thursday of next week.
    • Fried UPS and cairo disk drive RMAs. Charlie

    B & T GROMACS


    • Latex on hopper is set. JoshH
    • MPICH - PPC problem, try it under x86. For stalling consider removing MPI_Finalize() call. JoshH
    • MPICH-2 - not installing, needs dubugging. Dropped due to hassle. JoshH
    • Intra collective communicators for next LAM-MPI version (7.1.x), sometime this Fall maybe.

    Folding@Cluster


    • JoshH working on child merge with GROMACS.
    • With 10 process 4 nodes highly utilized, one marginally and 5 minimally used. Consider manual load balancing ala OpenMosix. Does MPI offer anything here (dynamic process launching)? OpenMosix itself is too complex to consider. Charlie
    • Scaling model coming along, still lots of configurations that fail under PPC and x86. Charlie
    • MPI - learn about application schemas, launching a binary from one node, load balancing options. JoshH, JoshM, Charlie

    Numerical Methods


    • John discovered the megaflop accounting is bogus. We can consider a couple of different approaches to fixing it: re-calculate constants and keep going, make accurate counters, use gprof, others?
    • Determine wall (elapsed) time accouting on bazaar and cairo. John
    • Draw picture of structure of GROMACS source module/function dependencies on white-board. Will make electronic later. Dawit and John
    • How to build GROMACS in home directory, JoshH to show Dawit and John
    • Literature search for vector algorithms. John

    Plumbing


    • b15 returned to its former glory as a cluster node. Dawit
    • b16 under image test for a week. Dawit
    • Merge SNA and plumbing list. Charlie
    • We need a canonical published list of changes made to 0th nodes after imaging:

      • routing
      • dhrelay
      • ssh keys (which need to be preserved in /cluster//etc?
      • others?

    • We need to do a backup audit soon.

    Cluster Closter Move


    • Watch temperature at North and South ends of room, are there hot spots? Does the intake vent need to be relocated?
    • Speak to BillB about remaining outlet and jack relocations, lighted switches. Charlie
    • New WeatherDuck on its way. JoshH

    Conferences and Presentations


    • No word from LinuxFest yet.

    Posted by bekelda at 01:19 PM | Comments (62)

    Update

    I finished reading HPC and gromacs manual chater 1,Chapter 3 I skimmed through.
    Doing some output staring, will be running more molecules to check for mflop accounting.
    Trying to figure out how to yank a computing process like LJ + Coul(WW) from main code and run on own for kernel benchmark.

    Posted by bekelda at 08:59 AM | Comments (37)

    Wednesday Update

    I put some of my grep'ing notes into the numerical-methods folder.
    Charliegave me info to start a literature search for papers involving modest-sized-vector algorithms, but I haven't found any real goods yet. I read more of K&R, which is slower going as I get to things that look less like C++.
    I am going to finish reading the GROMACS manual appendix B on 1/sqrt(x) and then do more grep'ing, with a focus on the dependincies of the inner loops I found.

    Posted by schaejo at 08:08 AM

    June 30, 2004

    Update - Josh H - June 30, 2004

    Worked On/Working On


    • Weather Duck

      • Sent follow up email to support folks about sound.
      • Created PHP page to allow user to adjust the graph to the last N hours.

    • GROMACS Port PVM: All levels finished.
    • GROMACS Port MPICH: Ran singular NxNxN tests, but working on how to get the SMP installation set up properly. It is currently stalling when the program finishes.
    • GROMACS Port MPICH2: Working on installation. This may be a flop.
    • GROMACS Port MP_Lite: Notes are now in blog
      Progressing with installation. Need to run GDB to see what is causing the core dump

    To do


    1. B-and-T-GROMACS Paper
    2. F@C development

      1. Start merging GROMACS mdrun with framework. Keep notes on any changes and what needed to be extracted.

    Posted by hursejo at 05:36 PM | Comments (64)

    June 28, 2004

    Meeting Notes

    persons present: dawit, john, both joshs, charlie

    Plumbing


    • Switch to 1/3 plumbing and 2/3 other.
    • LAM on cairo - do a bproc search to find problem. Check lam list for other installation errors.
    • Weatherduck - dropdown for time periods.
    • No word from weatherduck people in regards to sound.
    • MP graphing tool works.
    • DVC and PQC are now tools instead of consoles.
    • Dawit to test bazaar images on bazaar annex.
    • PBS scare. Someone on plumbing remove.
    • Charlie got a recommendation for Debian on the clusters. Consider for future.
    • Install yellowdog 3.0.1 on cairo.
    • Get athena working (F@C testground?)
    • Fishy lag on cluster, hopper, admin. Smells like a DNS problem. Check it out.
    • Kill mt image wishlist entry.
    • Joshm and Dawit will look at newly updated plumbing list.

    Numerical Methods

    • Charlie will update reading list.
    • John found many cool things while grep'ing through gromacs.
    • Goal is to pull out the inner loops in gromacs for the benchmarking kernel.
    • New goodies in the gromacs-overview cvs project.
    • Look at pros and cons of loop unrolling vs cache efficiency.

    F@C

    • Meeting Tuesday at 1p.

    b-and-t

    • Poster not yet printed. Thoughts of submitting poster to Kinko's web interface. Joshh will see if we can get a proof electronically. If not, charlie will pick it up.

    Cluster Move

    • Move weather duck to the new cluster room.
    • Possibly move sometime next week.

    Misc.

    • 11a-1p meeting Thursday (lunch included). Normal meeting time next Monday.
    • on cluster.earlham.edu: split resources page into tools, resources, and monitoring. Maybe a combination of all (problem with a label for this). Put links to the reading, plumbing, conferences, summerplan, docs.

    Posted by mccoyjo at 02:19 PM | Comments (20)

    Sunday update

    After more greping around gromacs I have found some more useful things. I found the print statement for the tail end of the .log files, and I am working on unraveling that thread. I have not found any of the other numerical routines that are employed yet (I found the most basic one a while ago). So I guess I will just keep looking.

    Posted by schaejo at 09:29 AM

    Update - Josh H - June 27, 2004

    Worked On/Working On


    • Molecule Testing Tool: Moved to CVS cgi-bin/mtt
    • Weather Duck

      • Sent email to support folks about sound. Waiting for response.
      • Added source code to CVS in generic/src
      • Should I display only the last X hours of reported data in the WeatherDuck Graph? I can see it getting crowded, and adjusting the script to display only the last 48/96/... hours is fairly trivial.

    • GROMACS Port PVM: All levels finished. Waiting on Graph of data.
    • GROMACS Port MP_Lite: Working on configuring GROMACS. Keep getting core dumps, and I am trying to figure out exactly why. Also producing my notes on MP_Lite in the blog.
    • GROMACS Port MPICH: Ready to run on cairo once Charlie is finished with it.

    To do


    1. B-and-T-GROMACS Paper
    2. F@C development
    3. Port GROMACS to:

      1. MPICH2

    Posted by hursejo at 07:21 AM | Comments (34)

    June 27, 2004

    update Josh McCoy

    Cluster Admin

    • LAM-MPI 7.0.6 - The install went smoothly on bazaar. There is a comilation error on cairo. I am in the process of an indepth check.

    • I am still messing with system imager on athena.

    • Bazaar annex as a testbed for bazaar image: I would like to do this if it does not step on others' toes.

    • The last option for a 2.6 kernel on cairo have failed. The mess has been cleaned up on c15. Clean install of yellowdog will happen in short order.

    • Added a cluster wishlist entry in the Cluster Admin category. Add anything you would like to have done before the change over to the new images. This includes changes to admin, hopper, and any software we use. It would be nice to record any changes made.

    Data Visualization

    • Spent some time cleaning DVC source.

    • Made first cut of a graphing script for MP implementation on ps vs parallel architecture. It still needs some cleaning and a coat of wax. Check it out at the Data Visualization Console.

    • Switched the DVC to full output for both debugging and checking out the latest information added.

    b-and-t gromacs

    • Read through the CVS documents and joshh's notes. More to come at 10am tomorrow morning.

    numerical methods

    • Added gromacs figs and cflows to the gromacs-overview project.
    Posted by mccoyjo at 11:23 PM | Comments (60)

    June 24, 2004

    Meeting Notes - June 24, 2004

    Numerical Methods


    • John Reported that K&R is going well, and he is feeling more comfortable reading C code. Has not looked too deep into the profiling numbers reported by GROMACS. Has started to look at the inner loops and lookup tables in the GROMACS source, and is keeping notes on which files and functions used for the extraction phase of the benchmarking kernel.
    • Need resources on Assembly programming on x86 and Altivec.
    • Need to think about how to actually develop the benchmarking kernel from the infromation we have. Esp. for folks who have minimal experience with C code.

    F@C


    • Moving forward. Josh H reported on the conversation Charlie, Prof. Pande, and Josh H had on Tuesday night. Our goal is to have a working version by the first week in August.
    • Molecule testing tool is current with last Monday's notes. Will move to CVS repository /cluster/cgi-bin.

    B and T GROMACS Paper


    • Meeting Monday at 10 am to talk about paper. Review all documents produced thus far (josh H's doc, and Charlie's notes in /cluster/project/b-and-t-gromacs/reposts specifically the ones with -buffer or -notes extensions in the directory that one cannot see via the web, MT enties), Where to publish?
    • GROMACS PVM Port: Waiting for Visualization tool
    • GROMACS MP-Lite Port: In development may have a lead by harnessing PVM development.
    • GROMACS MPICH Port: Compiled and working on the scheduler script.
    • /cluster/project/gromacs-overview to review for GROMACS notes.
    • Should we develop a document that has a short description of each C file in GROMACS along with descriptions of which variables/functions mean what? Use are limited knowladge to start then open the 'Help Document' Project up to the GROMACS developers community.

    Plumbing


    • Bazaar is ready. Need wish list for Zero'ith node, and need environment for testing zero'ith node. Suggested using bazaar annex as a seperate cluster to test the bazaar image.
    • Athena: Josh M is learning SystemImager. Athena needs a bit of network work (NIS, etc.) before it is ready to image.
    • Cairo: YD 2.6 kernel is not building at all. There is a core group of people developing the 2.6 kernel for PPC, but nothing is working yet.
      We have the latest and greatest, so leave it as is and wait for the stable 2.6 kernel to emerge.
      The latest version of YD is a minor fix for G5 security hole, and RPM upgrades. Josh M is going to go through the Change log once more to see if there is anything that we care about. This would be an upgrade from 3.0 to 3.0.1.
      If there is nothing then Josh M will do a fresh install on c15 to clean out the kruft in the current image. He will do the same for the c0 Image.
    • No progress on Data visualization, but now that the load of the images is lighten'ing Josh M will move to developing the PVM visual for Josh H.
    • PostGreSQL is being backed-up on admin via the backup script. Josh H fixed a ssh key exchange between admin and hopper for root.
    • LAM-MPI is not upgraded due to FTP troubles on host site. If it is not up in a few days Josh M will send mail to Users list.

    General


    • Charlie is using Cario instead of Bazaar for the workshop.
    • DNS lookup problem may extend beyond bazaar to hopper and admin as noted by Josh M. Should look into this while developing the image. Charlie can take a look at this next week since it requres a bit of advanced knowlage of named and DNS.
    • Josh M will park the cflow and xfig files for GROMACS that he has in /cluster/project/gromacs-overview
    • Noted that we should all strive to publish our updte notes the night Before the meeting, and send notifications when we post our bi-weekly entries.

    Cluster Move


    • WeatherDuck e-mail to support was sent. No word yet. If no word in a couple days, Josh H will ping again.
    • Josh H will put the WeatherDuck source code into CVS into repository /cluster/generic/src/
    • Do we have a date when we will do the move? Mid-july, if not sooner?

    Conferences and Presentations


    • LinuxFest is corresponding with us. We may need to alter our approach a bit, but it should work. Josh H has been forwarding all correspondance to the listserv as it comes in.

    Posted by hursejo at 02:30 PM | Comments (31)

    update Josh McCoy

    The plumbing is going well. I hope to have things wrapped up by the end of the work day. I still need to play with system imager on athena and do the a0-a11 flop.

    I am quite ready to move on to the data visualization tool and to doing some science.

    Posted by mccoyjo at 09:40 AM | Comments (10)

    Wednesday Update

    I choose to spend the majority of yesterday reading K&R rather than GROMACS code. Today I am planning on finishing the chapter on structures and then searching the code.
    Are bitwise logical operators important? I didn't quite understand that part, but wasn't sure it was worth the time to digest fully.
    On the meta-level, I am getting kind of frustrated. I am three and a half weeks in, and I don't feel like I have done much. This doesn't need to be a meeting item, but I would appreciate any topical words of wisdom, or mental judo tricks to deal with it.

    Posted by schaejo at 08:09 AM

    Update - Josh H - June 23, 2004

    Worked On/Working On


    • Gave John a tour of CVS and grep.
    • Added pg_dump of PostGreSQL DB to the admin backup script. I also fixed some ssh keys on admin for root.
    • I have been communicating with the Ohio LinuxFest organizers WRT the b-and-t-gromacs presentation.
    • Molecule Testing Tool: Per our conversation on Monday, I made a series of changes allowing for editing/deleting/updating of tests and molecules.
    • Weather Duck Send email to support folks. Waiting for response.
    • GROMACS Port PVM: All levels finished. Waiting on Graph of data to make judgement on performance. Once the best has been determined then the rest of the parallel structures will be run on cairo.
    • GROMACS Port MP_Lite: Working on configuring GROMACS. Also producing my notes on MP_Lite in the blog.
    • GROMACS Port MPICH: Working on an installation.
    • F@C Development: Talked with Prof. Pande and Charlie, and have some notes that we will review soon. Goal date: 4-6 weeks from now [First week in August]

    To do


    1. B-and-T-GROMACS Paper
    2. F@C development
    3. Port GROMACS to:

      1. MPICH2

    Posted by hursejo at 07:07 AM | Comments (56)

    June 21, 2004

    Meeting Notes - June 21, 2004

    Numerical Methods


    • JH gave JS a grep tour. The cairo runs are done, results are in the table in JS's home directory. JM has the directory ready, he needs to check with JH for instructions for how to do the CVS chicken swing, that needs to be documented in MT under Cluster Admin. The distribution of time was roughly the same under x86 as PPC. We still haven't identified a proper subset of MDP parameters to use to identify particular methods used with a given molecule.

    F@C


    • JH's molecule testing tool looks good. Location should be clear that it's a file system reference, URL manufactured from that. Need a delete mechanism, both molecule and tests and just tests. Edit the tests row.

    • On Tuesday evening JH and CP will talk with Vijay Pande. Before that JH and CP will design an MD agnostic architecture.

    B and T GROMACS Paper


    • MP_Lite might be a flop. There is a dependency between GROMACS with MPI and FFTW with MPI. FFTW uses more complex MPI instructions, ones that MP_Lite doesn't implement. JH thinks he can work through this.

    • Now that c14 is available JH will start on MPICH.

    Plumbing


    • Imaging - JM says that bazaar good to go, athena not quite complete (NIS, mounting, c3 tools, etc.), ppc in progress with Yellow Dog (CD download is taking forever). Latest version of Yellow Dog isn't that different from what we are running now (3.0 vs 3.0.1), JM will try 2.6 kernel under 3.0 and see if it works. JM will try SystemImager out on athena.

    • NIS is running on admin now, /cluster seems to be mounted permanently on admin.

    • JM will install the latest LAM-MPI on bazaar and cairo preserving the old version.

    • Database backup - JH will add a pgdump to backup script following the same even/odd directory pattern.

    • JM will try to get some of the graphing work done while waiting on imaging, etc.

    • Charlie still needs to update the wish list for the new image.

    General


    • We should all be getting in the habit of making MT entries when we have a substantive item, e.g. exactly what code is responsible for the LJ + Coul(WW) entry in the results from GROMACS, or how the PVM implementation of GROMACS compares with LAM-MPI. Remember to use appropriate blessed keywords with each entry. These nuggets will form the basis for our posters and papers.

    • For the workshop CP will use bazaar. 14 users (user0-user13), ssh setup, GROMACS, FFTW, subdirectory called villin with the base files set to 500 steps.

    Cluster Move


    • Charlie will ask (again) about lighted switches, key core, and vibration on the panel. BillB is going to move the next ethernet port from the East wall to the closet and remove the 110v.

    • WeatherDuck - Working fine other than sound, JH's graphs look good. JH will check with them about swapping it for another unit. JH will put it in cvs, the link to it is already in the references section of cluster.earlham.edu.

    Conferences and Presentations


    • Consider SIAM Computational Science and Engineering 05, submission deadline is August 11. F@C (poster), B and T GROMACS (paper), numerical methods for MD (poster). CSE05.

    • LinuxFest has started to review presentations, no word yet.

    Posted by charliep at 08:06 PM | Comments (41)

    Update Monday morning

    Sorry I should have done this last night.
    I have moved the contents of the table to an html format, but I don't know how to look at it. It is probably still pretty ugly, and I have no idea what the spacings will look like, but it is in close to the right format.
    I would like to do that grep and find tour today. I tried to figure some stuff out on my own, but I did not turn over any useful results.
    I am starting the cairo runs this morning, because I know how to do that part.

    Posted by schaejo at 09:41 AM

    Meeting Notes - June 17, 2004

    Numerical Methods


    • Problems with molecule organization, we need a better structure. Identify the subset of mdp parameters that we care about to distinguish molecules and methods. JM will create /cluster/project/numerical-methods and the associated CVS stuff. src and doc subdirtories for now. JS will use one of the tables from the b and t gromacs poster to move his chart to that form and park it in CVS in that directory. JS will start with LJ + Coul(WW) and figure-out exactly what code that represents. JH will work with JS to give him a tour of the relavent tools.

    F@C


    • Josh is working on a tool for tracking molecules and results. On Tuesday evening JH and CP will talk with Vijay Pande.

    B and T GROMACS Paper


    • First cut of mp_lite seems to be working. PVM port is running. JH has notes for mp_lite setup that he'll publish in MT with appropriate blessed keywords. JH, JM, and CP will meet on Monday the 28th

    Cluster Move


    • The cluster closet looks like it's ready as of about 12:30p today. Charlie will ask about lighted switches, key core, and vibration on the panel. BillB is going to move the next ethernet port from the East wall to the closet and remove the 110v.

    • WeatherDuck - Working fine other than sound, JH's graphs look good. JH will check with them about swapping it for another unit. JH will put it in cvs, the link to it is already in the references section of cluster.earlham.edu.

    Plumbing


    • Gentoo on PPC - CDs did not work in the lab. New cut of cd failed as well, are those Macs disabled from booting from CD? I don't want us to spend our whole summer doing plumbing! JM will install latest and greatest Yellow Dog with a 2.6 kernel on c15 and see how it goes.

    • Gentoo on x86 - bazaar is ready to go other than the additions necessary for b0 and a problem with NIS. Try file attribute of immuteable to see where yp.conf is being deleted. Charlie still needs to update the wish list for the new image.

    • JM will try to get the images for both bazaar and cairo done by the middle of next week.

    • The mount of /cluster to admin should be made permanent. NIS setup on admin. JM.

    • Long-term item - Scripts to monitor WeatherDuck and notify and then shutdown as appropriate. Organize and web publish all our code for this (HIP?).

    General


    • Support for workshop next week - Charlie will need 10 users, user1-user10, setup with rc scripts that setup the appropriate paths for GROMACS, LAM-MPI, etc. on cairo.

    • JH will move cp's extra phone patch to the seminar room.

    • Meetings next week - Monday plumbing and paper, Thursday F@C and numerical methods.

    • JM latter half of next week data visulation tool and preset query tool development.

    • We should all be getting in the habit of making MT entries when we have a substantive item, e.g. exactly what code is responsible for the LJ + Coul(WW) entry in the results from GROMACS, or how the PVM implementation of GROMACS compares with LAM-MPI. Remember to use appropriate blessed keywords with each entry. These nuggets will form the basis for our posters and papers.

    Posted by charliep at 04:27 AM | Comments (28)

    June 20, 2004

    update Josh McCoy

    There isn't much to update as I have just returned from a weekend trip. I will continue to work on the images and other plumbing until the middle of the week.

    Posted by mccoyjo at 11:41 PM | Comments (34)

    Update - Josh H - June 20, 2004

    Worked On/Working On


    • Weather Duck I updated the code to make it a bit more robust, and checked the sound. It seems to work exactly as it should, but I cannot verify the dB since I don't have anything that produces a given dB that I can measure with it. Some informal testing showed that it was working fine, but not nearly as sensitive as it once was.
    • GROMACS Port PVM: Version 0, 1, and 2 are finished. Version 3 should be finished Monday morning. Next to run is the MP-Lite port.
    • GROMACS Port MP-Lite: I am compiling the first version using the MPI bindings and default TCP window sizes. Once the PVM runs are finished then I can fully test my configure scripts, and make sure that I have everything wired up correctly.
      Versions:

      1. Version 0: (mp_iterface = MP-Lite) Use MPI Bindings and little to no changes in the GROMACS code.
      2. Version 1: (mp_iterface = MP-Litev1) Use make tcp_sync to see how much of a performace gain we can set from the syncronous TCP version.
      3. Version 2: (mp_iterface = MP-Litev2) Possible Convert GROMACS source from MPI to MP-Lite syntax to see if there is a performace benifit from using the native calls instead of the Wrappers. This is unlikely, but possible.
      4. Version 3: (mp_iterface = MP-Litev3) Use the suggested rmem_max and wmem_max sizes in the MP-Lite/README

    • I am working on a tool to catalog informaiton about molecules. Specificaly Fail modes, and contributed information. I am currently working on the web interface to this tool. I have created 2 new tables in the databse, and thier schemas have been appended to the end of the db-objects.sql file.

    To do


    1. B-and-T-GROMACS Paper
    2. F@C development
    3. Port GROMACS to:

      1. MPICH
      2. MPICH2

    Posted by hursejo at 10:20 PM | Comments (39)

    June 17, 2004

    Update - Josh H - June 16, 2004

    Sorry I forgot to post last night.

    nothing really to report. i am running the PVM tests, and waiting for them to finish. While waiting I am investigating MP-Lite

    Posted by hursejo at 10:21 AM | Comments (31)

    wednesday update

    I made very little progress wednesday. I expanded/tidyed-up the table, and JoshM and I found a copy of K&R in the Library. I am almost finished with Chapter one of the Numerical Methods book.
    Question: How am I supposed to find these sub-routines in GROMACS? Charlie's comment about working backwards from the print statement made some sense, but JoshM showed me what a mess the GROMACS source is and I feel I could use a litte more guidance on this project. I don't want it done for me, but I do want to know some inteligent ways of doing it.

    Posted by schaejo at 09:22 AM

    June 15, 2004

    Meeting Notes - June 15, 2004

    • Postmaster was not running on hopper. Josh M will check to see if hopper has a shutdown script for postgres.
    • John is making progress in the Numerical Methods book. Charlie suggested some places to focus
    • Image. athena and bazaar are mostly/completely fine. Gentoo boot cd is flaky on PPC. Going to test in the Mac Lab to see if that helps. Boots, but does not play well with /dev. If all else fails then try to port 2.6 kernel to latest YellowDog release.
    • John has a chart of methods/molecules/dominate subroutine/flops.
    • John and Dawit should be starting to learn C so they can move through the GROMACS source they care about. Specifically how these numbers are calculated. This will be the first step in creating a subset of GROMACS to focus on, and tune. This way we do not. How stats are done? What they represent? Develop Benchmark Kernel.
    • Molecule repository cleanup. What is project1012
    • PVM is running. hopefully by the end of the room all of the PVm runs will be finished. Once Josh M gets the graph I need I can visual this performance improvement/loss.
    • Is the cluster room ready. Power Outlets/HVAC installed/hole filled in/Cookies?
    • Kinkos stuff should be moving with Josh and Charlie back to Richmond.
    Posted by hursejo at 01:33 PM | Comments (29)

    update Josh McCoy

    Due to very poor internet connections and loss of electricity(both at home and at EC), I have basically been catching up on the reading list. I did have brief chance to mess with ypbind on the bazaar gentoo image. I have also subscribed to several gentoo mailing lists (ppc-user, ppc-dev, cluster, osx) and have taken a good look at all of them. Unfortunately, traffic is very light (so light that the digests are weekly or even monthly in some cases) and many posts go without responses.

    The next few days will entail finishing the cluster images and using system imager to distribute them. Hopefully all goes well.

    Posted by mccoyjo at 09:39 AM | Comments (25)

    June 14, 2004

    Sunday Update

    I have finished the table of molecules. I am pretty sure that I only needed to run each once, because while the results may have varied from run to run, the number of calls to each sub-routine did not. I am not 100% sure of this, but I have gotten the same results everytime I have run the shorter molecules.
    I will entertain myself today with more reading, and hope that Charlie outlined the chapters for the Numerical Methods book.

    Posted by schaejo at 09:02 AM

    June 13, 2004

    Update - Josh H - June 13, 2004

    Worked/Working on


    • Found this interesting command from the lam list:
      laminfo -param rpi all | grep priority
      It should display the default RPI module ranking. It must be a new feature (integrated in versions later than 7.0.2) since it does not quite work with our current setup.
    • GROMACS PVM Port: Finished cleaning, and it has been running tests. I have version 1 running now. Once it finishes then I will run version 2 on the same dataset to see if there are any performace gains. There may be a version 3 to test the performace loss/gain of the following command:
      pvm_setopt(PvmRoute, PvmRouteDirect);
      Which was in the orignal code, and preserved in versions 1 and 2.

      • Version 0: (mp_iterface = PVM) Orignal Version from GROMACS site.
      • Version 1: (mp_interface = PVMv1) Modified to allow multiple slaves per node in configuration.
      • Version 2: (mp_interface = PVMv2) Made all sends PvmDataRaw since we do not use a endian hetrogeneous cluster. This will likely give us a performace boost, at the cost of portability to hetrogenenous environments.
      • Version 3: Possibility (mp_interface = PVMv3) Test the performace result due to the removal of the explicit PvmRouteDirect declaration.

    • Fixed Arther Vining Davis logo in Poster, and placed files on JumpDrive

    To Do:


    • B-and-T-GROMACS Paper
    • Port GROMACS to:

      • MPICH
      • MPICH2
      • MP-Lite

    Posted by hursejo at 12:48 PM | Comments (86)

    June 10, 2004

    Meeting Notes - June 10, 2004

    • John is running simulations, using files in /cluster/project/molecules noting the soft-links present in that directory.
    • Bring Numerical Methods Book from Ranch.
    • Move notes:
      • Athena Image is ready.
      • Bazaar is coming along. NIS and c3-tools need to be installed.
      • Cairo will not install Gentoo. can't go to YellowDog easily since it does not support the 2.6 Kernel. JoshM will
      • Send mail to Bill Birum about north wall of Recompute space and moving the boxes. Contact security about key core.
      • Send donut requests to Charlie for Sunday morning.
      • Josh H will shutdown clusters around 8ish Friday.
    • O'Reilly Essentual System Admin Book to add to reading list.
    • Charlie is noticing a difference between running a molecule from a fresh boot and from a longer uptime system on Bazaar with regards to failure. He will keep monitoring this situation.
    • PVM port of GROMACS has reached stage 2 of 3. After cleaning things up and testing a bit, Josh H will post a tarball of the new source to the GROMACS site.
    Posted by hursejo at 01:44 PM | Comments (56)

    update Josh McCoy

    The last few days have been long sessions of plumbing mostly consisting of installing gentoo on various nodes. Here is a progress report.

    Athena - a11 - Dawit seems to have control of the image here. I'm not sure of the details, but progress is being made.

    Bazaar - b20 - Things are going well on bazaar. The image is almost ready save the the following items: nis, c3tools, a dns issue, and getting dhcpd to run on boot.

    Cairo - c15 - There is an issue about booting from the gentoo cd that we are unable to solve (dawit, skylar, and I). The odd thing is that the process begins, you get to pick your temporary kernel, and then the errors start to roll in. It seems the /dev filesystem is not initializing properly. I have tried everything I can think of (including a substantial amount of time trying to track down information) to no avail. I'd be happy to give a tour of the problem tomorrow during our meeting.

    I would like to note that gentoo has very little automation in regards to installation. It boots to an environment on the cd and says have fun. This has made the install both more time consuming and more worthwhile. Considering my previous sys admin experience, I've jumped into the deep end of the pool and seem to be swimming.

    Posted by mccoyjo at 09:36 AM | Comments (23)

    Wednesday Update

    I have been doing runs on node b16 and recording the most significant sub-routines to a file. I didn't get all of them (or truncated versions of all of them), but I did 8 of them. Two came out looking very similar, and I don't really know why.
    I finished reading chapters one and three from the GROMACS manual. I don't really know how much of that was expected to stick, because not much of it did. If it is worth spending time on, I could go back through and try to unpack into a more readable form.
    Since I lack the courage to wade into the GROMACS source code that JoshM showed me the location of, I will spend tomorrow morning revisiting the HPC book and making notes to myself. If I finish that, I will start on a numerical analysis book that Tim McLarnen gave me.
    Whats next?

    Posted by schaejo at 12:46 AM

    Update - dawit

    I got the short intro on how to ran gromacs from Josh.
    I have now a working gentoo image on athena, node 11 (159.28.231.32). The network is configured statically since there is still the dhcp thing to work on. But you guys can test the system, ssh is enabled.

    Posted by bekelda at 12:08 AM | Comments (68)

    June 09, 2004

    Update - Josh H - June 9, 2004

    Worked On/Working On


    • WeatherDuck Monitoring: I have a working set up and running on admin. You can access the graph which is updated every 10 min. I did not add the ability to automatically shutdown the cluster is the temp is over X or any functionality like that. At the moment I am just poling the device, gathering data, then pushing that data through gnuplot.
      The Sound metric is dB.
    • GROMACS PVM Port: It is comming along. I have broken the sequential nature of the previous port, and am now cleaning things up. It should now work with any collection of runs that we wish. I have not tested to see if the dependancy upon the number of nodes in pvmd is still there. In theory it should not be, since it was only a dependancy because of the sequental numbering scheme in pvmd when you order the nodes that you add correctly and have is match the size of your ring (which was very shaky ground from the beginning).

    To do


    1. Thursday Morning: Fix Arther Vining Davis logo in Poster
    2. B-and-T-GROMACS Paper
    3. Port GROMACS to:

      1. PVM
      2. MPICH
      3. MPICH2
      4. MP-Lite

    Posted by hursejo at 04:45 PM | Comments (73)

    June 07, 2004

    Update - Josh h - June 6, 2004

    Sorry for the late update, I was without net connection for the past 48 hours.
    I have been working on the PVM GROMACS code, making it a bit more roubust. hopefully I will be finished with it by the end of the day Monday.

    Posted by hursejo at 09:23 AM | Comments (78)

    Update - Dawit

    I have been working on plumbing all weekend. A CD burn of a gentoo installl failed for G4-SMP kernel(hence why c15 has been taking a break for the weekend). A gentoo install failed halfway on athena when trying to chroot, happens to be an error on my part in thinking that the subarchitecture is an i686. Hopefully this same image would work for the bazaar node. I am taking node 20 from the annex.
    Me and Skylar are also waiting on hopper to finish downloading packages with jigdo for Debian. It appears it has 2.6 kernel support.

    Posted by bekelda at 02:58 AM | Comments (53)

    update Josh McCoy 6 June 2004

    • Updated the GROMACS software overview with information on running single/multiple process(es). Added a few sample commands to show how to run villin.
    • Fixed the titles of the DVC and QVC and cleaned up/documented source.
    • I should have a working graphing script for joshh tomorrow.
    • Read Hassan's system imager docs. It tells of a special case for the 0th nodes. I'm guess there aren't images for the head nodes.
    • I'm hoping to have the gentoo installations in basic working order (NIS, NFS, networked properly) Monday. Dawit and I will have to spend a majority of our time tomorrow on this task (side note: I plan on coming to campus Tuesday and Wednesday to help ensure the new images go over smoothly).
    • Took a look at OSCAR. The project seems interesting. It may be worth attempting to install on the annex.
    • I had a hard time finding warewolf (even after searching the beowolf mailing list archives). Can someone send me a link?
    • After spending some time looking at gentoo, I found a list of gentoo supported kernels. I didn't see any specific references to NAPI or low-latency kernels.
    • As far as I can tell, Linux kernel 2.6 has NAPI support in that many of the drivers included with the kernel support NAPI. The official change logs/readmes can be found here: http://www.kernel.org/pub/linux/kernel/v2.6/
    • Read more of the first altivec reading.

    Posted by mccoyjo at 01:19 AM | Comments (42)

    June 06, 2004

    John's sunday's update

    I have moved my eyes across all the text in the HPC book. I will revisit it in the near furture to make notes for myself. I intend to find and read the "so you want to run GROMACS" document tonight or tomorrow. I will no doubt have questions involving that process, but those questions would not be an efficient use of meeting time. If one of the two Joshes could help me out, that would be great. I know my understanding of remote remote computing, via ssh or whatever, is lacking. But I do not know if that is my biggest problem.

    Posted by schaejo at 10:17 PM

    June 03, 2004

    Meeting Notes - June 3, 2004

    CCG Meeting - cp, jh, jm, db, js

    Check-out clusterworld.com's urban legend article. If you aren't a slashdot.org reader check that out too.

    The b-and-t gromacs prose that Charlie was going to find is already in CVS in the -notes files.

    Plumbing


    • distro research, so far Josh thinks either Gentoo or SuSe (out of SuSe, Fedora, Gentoo, Debian, RedHat). He'll check-out OSCAR and Warewolf. Gentoo and SuSe have recent kernels and frequent updates. Gentoo supports lots of kernels, do any of them purport to be low latency? Gentoo supports x86, PPC, and SPARC. D and J will build three images, a, b, and c; NIS and NFS, and will let us know when we can test it.
    • When you have a minute check-out IPMI.
    • Charlie will check with John Walker about 24H HVAC
    • 0th nodes are custom - cexec (should be on all nodes), routing, dhcrelay. Check local SystemImager for others and update as necessary.
    • WordPress after the images are ready.

    Data Visulization Console - JoshH's bug couldn't be recreated. Use "DVC" and "PQC" for the titles. This was cut short, more to follow.

    Running GROMACS - JoshM will update the document and make sure that a single processor config is supported. John and Dawit will learn how to run it on bazaar. Study the output from mdrun, particularly mega-flop accounting, and start groking it.

    Sub-meetings - Mon for plumbing and b-and-t gromacs paper. Thu for F@C and numerical methods. 1p most weeks, 7p or thereabouts when Charlie is out of town. Try to stick to 1/2 hour per topic. All of us attend each meeting (except during vacations, etc.)

    Readings are going ok, answered questions. Starvation will occur sometime next week, Charlie will update the list.

    Brief review of the summer-2004 plan. During each of the upcoming sub-meetings review the pertinent section.

    Communication - We talked at lunch about various approaches to this, Charlie will write them up and circulate.

    GROMACS and message passing libraries - JoshH still plugging along, limited results and a direction to pursue WRT node numbering.

    Next meeting items


    • All - review pertinent section of the plan
    • Plumbing - plan for workshop support

    Future


    • Data visuaization console
    • Preset query console
    • Grant tour
    • Conference tour


    • Scheduler presentation and conversation, read JoshH's entry in preparation
    • FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
    • Modular F@C - make a plan

    Posted by charliep at 04:02 PM | Comments (34)

    Update - Josh H - June 2, 2004

    Working on/Worked on


    • Fixed dhcrelay on cairo. All nodes are now using DHCP to configure eth1, eth0 is disabled on startup by default.
    • Porting GROMACS to:

      • PVM: I have been able to run a molecule. I have a script that is running our tests on cairo. It has completed the NxNxN or Eq. A, and will be working on Eq. B-D for the next couple of days.
        I need a new Graph in the DVC to display PVM and LAM-MPI runs on the same graph. So I can show the performance difference between PVM, LAM-MPI, MPICH, MPICH2, and MP-Lite on the same graph for a single molecule. Can we adapt the 2-D level graph or should we start anew? I am leaning towards the latter.
        Currently the scheduler uses the DB to store results, but does not use the option_profile table due to shear lazyness of the developer.

    To Do list


    1. Fix Arther Vining Davis logo in Poster
    2. B-and-T-GROMACS paper
    3. Port GROMACS to:

      • PVM
      • MPICH
      • MPICH2
      • MP-Lite

    4. Modify DB schema and scheduler to contain [success|fail], Where did it fail, how did it fail.

    Posted by hursejo at 07:27 AM | Comments (15)

    update Josh McCoy 2 June 2004

    • Installed Debian 3.0 ("Woody") on the athena golden client.
    • Did research on x86 linux distros. Expect an mt entry tomorrow morning after I get to work.
    • Took a preliminary look at rdist.
    • Gave a good lock at the docs for the latest version of gnuplot as to assess how much the graphing scripts need to change to accomodate a version upgrade.
    • Display tool updates:
      • Changed table schema to use a sequence as the key.
      • Fixed the update vs add issue (PQC)
      • Removed preset query deletion box (PQC).
      • Removed associated graphs column from tables(PQC).
    • To do for display tool:
      • Take a closer look at the bug Joshh brought to light.
      • Merge delete, add/update scripts with PQC. They are quickly converging.
      • Enable column selection box.
      • Update names/descriptions of the graph buttons.
      • Make 2d Graph work.

    Posted by mccoyjo at 01:46 AM | Comments (68)

    June 02, 2004

    Update-dawit

    Me and Josh have installed Debian on one of the athena nodes and we will be testing it to make sure it is ready to be a golden image.
    I am still playing with the systemimager problem that I had with the old RH7.3 athena image.The problem seems to be that the dhcp packets that hopper sends do not get delivered to the nodes when sysimager does a dhcpbroadcast. I restarted dhcrelay on a0 on both interfaces and ipforwarding has been enabled.Does anyone have any similar past experiances with dhcrelay?
    Another issue that might affect it is the line ddns-update-style in dhcpd.conf(I tried both interim and ad-hoc but it complains that it does not recognize it).

    Posted by bekelda at 11:04 PM | Comments (63)

    Wednesday Update

    Um... Here goes.
    I am on page 67 of the HPC, and understood most of it. I will probably need to re-read some parts (ie chapter 3). I have read the first Altvec document, but I will read it again to gather specific questions for tomorrow.

    Posted by schaejo at 06:47 PM

    June 01, 2004

    Meeting Notes - June 1, 2004

    CCG Meeting - cp, jh, jm, db, js

    Schedule - 40 hrs/week, Mon and Thu working on the second floor of Dennis between 9a-5p ish. Don't forget the Sunday and Wednesday evening updates.

    Move - Check to make sure that HVAC is on the 24x7x365. We have a WeatherDuck that we'll use for monitoring the temperature and shutting down the clusters if necessary. Cairo moves as a unit (minus admin and bazaar), bazaar we'll take 1/2 of the machines out. Shutdown Friday night (JoshH), move Sunday morning at 7a (Charlie will bring the donuts). Clean ENI lab and shuttle remainder of ancilliary gear later in June.

    New images - see plumbing document for the details. JoshH will be running tests until just before the move. JoshM and Dawit can use bazaar annex and c15 for testing, plan to upgrade to new images on Thursday June 17th. What to do on x86? GenToo, RedHat9, ROCKS, SuSe, DebIan, others? Use DebIan for athena, not RedHat.

    Workshops - during the weeks of June 20-26 and August 8-14 Charlie will be using the clusters for students in the parallel and distributed workshops. Partition each, part for workshops and part for CCG.

    Grants - probably to a couple of private/corporate foundations, maybe NSF, maybe Keck. Consider further in a couple of weeks.

    Reading - Dawit and John need to both read HPC, timeshare efficiently.

    Sub-meeting plan - Before too long it will be more efficient to have regular b-and-t gromacs, f@c, numerical methods, plumbing.

    Data Visualization Console (DVC) - Key value for table, makes editing possible. Add orderby. If using a preset query only appropriate graph types should be displayed. Graph = 3D parallel architecture and levels with PS/day; 2D Graph = 2D molecules and layers with PS/day; 2D graph (one molecule) = molecule and layers with PS/day (currently broken); Text Dump = Text Dump; Parallel Architecture Graph = 2D parallel architecutre with PS/day (either one molecule or many). For some graphs reasonable to choose one molecule or more than one. For others only one or more than one is sensible, label appropriately. Finish column select functionality. Link to this in the Resource section of cluster.earlham.edu.

    Preset Query Console (PQC) - Drop entry box for Delete Preset Query, add orderby. Link to this in the Resource section of cluster.earlham.edu.

    Communication - We talked at lunch about various approaches to this, Charlie will write them up and circulate.

    GROMACS and message passing libraries - JoshH still plugging along, some problems with initial PVM on cairo (eg it crashed the cluster). Use native communications for PVM.

    WordPress - J&D will install it on admin for us to testdrive, easy to migrate MT to WordPress but coming back may be hard.

    Thursday meeting


    • Which x86 distro?
    • Review summer plan
    • Conference and presentation tour
    • Sub-meeting plan

      • B and T GROMACS - review JoshH's draft, charlie to find old prose

    • Data visuaization console
    • Preset query console

    Future Meeting


    • Scheduler presentation and conversation, read JoshH's entry in preparation
    • FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
    • Modular F@C - make a plan

    Posted by charliep at 08:00 PM | Comments (73)

    update Josh McCoy 31 May 2003

    Posted by mccoyjo at 01:27 AM | Comments (56)

    May 31, 2004

    Update

    Reading HPC.
    Worked on one of the quarkprime machines installing FreeBSD.
    Athena image ready and problem with system imager still a mystery. When trying to make a bootable diskette I get the choice of flavors between ATHENA and standard so i am thinking it has to be in the boot loader files since that was what came up last time.Will try to work with skylar to resolve issue and athena cluster should be ready.

    Posted by bekelda at 11:23 PM | Comments (67)

    May 30, 2004

    Update - Josh H - May 30, 2004

    Working on/Worked on


    • Readings: Altivec Documents
    • Fixed /cluster mounting issue on admin, commenting out the mount in rc.local
    • Installed a CD-R drive in admin and created a i386 boot cd. Also installed cdrecord. see sna log for more details.
    • Figured out that systemimager does not currently support creating a boot disk for PPC. So we must use the yellowdog install, systemimager client install, update method until someone figures out how to make a bootable CD for the PPC. I put the instructions in the sna Systemimager HowTo document.
    • Found BootCD which is able to make a bootable cd for Mac OS X. Maybe we can widdle it down to something that we can use with systemimager.
    • Fixed the permissions for CVS direcctories by making them all group writible.
    • Porting GROMACS to:

      • PVM: Going well. I am still trying to figure out how to run the verison from the website on cairo and bazaar. This is what crashed cairo this week. I am making progress.
      • MPICH: Installed MPICH on cairo and will install the beta release of MPICH2 on cairo which implements the MPI-2 standard, or hopes to.
      • MP-Lite: should be installed on cairo, but I am focusing on the PVM and MPICH implementations before I get to this.

      All implementations will be posted to the Earlham Cluster Computing site when they are ready, and then contributed to the GROMACS site. I want to develop them for Cairo since I will have to run many tests and it is the fastest cluster we have. This way when we are ready to test, this will be a relitively quick process.

    To Do list


    1. Fix Arther Vining Davis logo in Poster
    2. B-and-T-GROMACS paper
    3. Port GROMACS to:

      • MPICH
      • PVM
      • MP-Lite

    4. Modify DB schema and schedulerto contain [success|fail], Where did it fail, how did it fail.

    Posted by hursejo at 11:31 PM | Comments (65)

    May 29, 2004

    Grant Opportunities

    Ava Willis and I have been researching various grant opportunities, we'll need to prepare one or more of these this summer in light of the HHMI rejection. I have a folder with the Foundation Directory entries for each of these if anyone is interested (J^2 mostly I suspect).

    o Sun Microsystems

    There are two catagories that I'm thinking about:

    Higher Education - Scientific and Engineering Computing
    Primary and Secondary - University Outreach to K-12 Schools

    For the first one we would ask for a hardware donation of Sun SPARC workstations to use for our work in molecular dynamics. For the second one we would ask for money to support a project which Ray Ontko (EC '84) have been noodling around for a while which involves giving presentations to area high schools with a couple of EC CS students and faculty.

    o Research Corporation

    I'm not sure about this one but I'm leaning towards no. They seem to be most interested in Physics, Astronomy, and Chemistry. Even though the basis of our work is Chemistry and Physics I suspect to these folks it will look like CS. I suspect the lack of a PhD would be an issue as well.

    o Keck Foundation

    I don't remember the details of our last effort (Fall, 2000) with Keck but it may be time to re-visit them. PaulO and I have talked about a particular project involving chemistry and CS which I think would be very attractive to them. I'm not sure who you should speak with to get the background here if you don't already know it. If you think we're ready to approach them again then the three of us (Ava, Charlie, and Paul) should set-up a time to meet sometime in the next couple of weeks.

    o Intel

    I think this is worth considering. It looks farily simple and there is no deadline. A proposal used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.

    o Arnold and Mabel Beckman Foundation

    I think this is worth considering. They require a pre-proposal letter which I can begin to craft. The deadline is October 1. A pre-proposal letter used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.

    o American Honda Foundation

    I think this is worth considering. They require either an initial phone call or letter. The next deadline is August 1. A proposal used here could be recycled (at least in part) for some of the others that we are considering. I'm not sure which project would be best suited yet.

    o Eastman Kodak Company Contributions Program

    I don't think this is worth our time. While there is some fit between our work and their giving areas it's not that great. The deadline was April 30.

    o Apple

    I don't think this is worth considering. The hardware they are offering we don't really need (we have a bunch of it already) and they are looking for people that are further up the chain than we are.

    Posted by charliep at 12:35 PM | Comments (14)

    May 27, 2004

    update

    Worked on a new athena image with JoshM, run into problems with nis, will work on it more together.
    Made floppies and did a network install of FreeBSD on one of the quarkprime machines.
    Learning useful command tools from man pages. Reading HPC.

    Posted by bekelda at 12:48 AM | Comments (0)

    May 26, 2004

    Update - Josh H - May 26, 2004

    Worked on


    • Spent a load of time working on bring up c1. Still have not figured out how to make a CD that is bootable on cairo for systemimager. I want to try installing the CD-R/CD-RW in admin and burning the disk there, since I read a post or two that any disk burnt under OS X will not work as a boot able disk for YellowDog. I believe this to be superstition, but I am about to that point in the process.
      All said and done cairo is back up and running properly.
    • Fixed the double mount of /cluster on cairo which made using cairo nearly useless. This involved commenting out a line in /etc/rc.local. it is working now after a reboot.
    • I rebuilt b19, and after a DHCP shuffle on hopper it is running again with bazaar annex.
    • Sent a chaser e-mail to LinuxFest to confirm our submission. No word back yet.
    • Nosed around in charlie's environment on Bazaar, and switched lamhalt to wipe in his run.pl script [renamed the new version to joshh-run.pl], and was able to run the tests just fine on bazaar. hopefully this will fix the problem.
    • Cairo reinstall Note: if you want the ethernet to work after a fresh yellowdog install on cairo, hard code the IP/GATEWAY/Primary DNS and disable eth0 on startup. All within the graphical setup.
    • Sent a message to the GROMACS developers list regarding thread support with and without MPI.
    • Looked over Summer Plan

    To Do list


    1. Fix Arther Vining Davis logo in Poster
    2. Readings
    3. B-and-T-GROMACS paper
    4. Port GROMACS to:

      • MPICH
      • PVM
      • MP-Lite

    5. Modify DB schema and schedulerto contain [success|fail], Where did it fail, how did it fail.

    Posted by hursejo at 08:00 PM | Comments (0)

    May 25, 2004

    Meeting Notes - May 24, 2004

    CCG Meeting - cp, jh, jm, db

    Bi-weekly updates (so far) are working well. Remember to make these entries each Sunday and Wednesday evenings. Use the General category with a title of "Update - Name - Date". All of us should read these updates before our group meetings on Mon and Thu.

    WordPress - All reviews look good. Added install/configure to the plumbing list.

    The new system/network/database administration list for this summer is now available at /cluster/project/sna/plumbing-summer-2004.html. All of us should use log.html (in that same directory) to note things we fix/install/change on the clusters.

    The first cut of the summer reading list is available at /cluster/project/generic/doc/readings-summer-2004.html. Check it out and read!

    The first cut of the summer plan is available at /cluster/project/generic/doc/plan-summer-2004.html. Check it out and get your feedback to Charlie.

    Everyone except Charlie updated the software overview chunks. Scheduler as it is currently will be nuked and new documentation created when the new scheduler is built.

    Plumbing - After a bunch of futzing the new drive is in hopper. JoshM and Dawit are still working on the athena cluster. JoshH is trying to build a SystemImager disk for cairo.

    Support for threads (-nt) in GROMACS - No news from JoshH yet. If anything it may work without MPI enabled.

    Communication document - Charlie is working on a document describing guidelines for using MT, email, WiKi, etc.

    Displayer - JoshM has some good ideas for how to have preset queries, graphing, etc. We drew a picture of a layout and discussed data model changes. Make a new form to display, edit, insert, etc. preset queries. Open enhancements and bugs: order of data points, consistant color/molecule mapping, grid lines on the graph option. Currently it uses the same predicate (with a different select list and order by) for the tabular data and the graph. Long description shows-up as a legend on the graph. Short description is used as the title, long as the legend.

    Conference and paper tour - No word from LinuxFest yet.

    Future Meeting
    Conference and paper tour
    Scheduler presentation and conversation, read JoshH's entry in preparation
    FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
    Modular F@C - make a plan

    Posted by charliep at 02:30 AM | Comments (0)

    May 24, 2004

    update Josh McCoy 23 May 2004

    • The GROMACS and FFTW software overview pages have been updated. FFTW info changes consisted of updating directories. The GROMACS info seemed to be fine save the info about the included scheduler documentation. Maybe the scheduler should have its own software overview page?

    • Wordpress seems very nice. It seems to have a large set of useful features. It is php based (which is good) and mysql based (which I'm not so crazy about). From looking at the forum, faq, and screenshots, wordpress seems to have a good method of have multiple categories per post and support for subcategories.

    • Work on the graphing utility mockups is going nicely. I've found methods of accomplishing most of what each method would entail. A point of contention that exists is to either augment our existing tables with the preset query data or to add a new table to hold the information.

    • A quick tour through the HTML 4.01 specifications yielded the code for using a combo box to choose the fields to be used in the select statement of our sql queries.

    • Reading Leach is still as exciting as ever. I'll read a bit more of it tomorrow morning and hand it off if some one else wants to peruse its contents.

    Posted by mccoyjo at 12:31 AM | Comments (0)

    May 23, 2004

    Update - Josh H - May 23, 2004

    Worked On:


    • After having it bug me for a while I put "finger to keyboard" and wrote v0.1 of the Benchmarking and Tuning GROMACS paper. You all should read over it and we should talk about it. We are close, but there are a few bits that need cleaning up, and some graphs that we will need to make. Here is the link (I will put it in CVS if you want, but currently it is just a txt file) B-and-T-Gromacs-Draft.txt
    • Where to send paper for publication:

    • WordPress Evaluation:

      • The sub catagories fature may be useful. MT currently cannot do this.
      • No rebuilding necessary. PHP based, not perl so instant posting with no need to hit the rebuild button. Actually minor point.
      • Able to import entries from MT to WordPress! Very nice!
      • Text formating seems to be a bit mroe advanced.
      • Has RSS Feed capability!
      • Looks like they are thinking about supporting a spell checker, MT has a plug in but it does not look like it is that good.
      • looks like only mysql is supported, and the database is necessary. in MT it is optional. I'm sure we could port it, but I think mySql is installed on hopper already.
      • Conclusion: I say we download a copy and try it out. Not for production, but to see if it works as well if not better than MT. the site is fairly light on the features and how it works, so this might be the best way to learn.

    • Fixed a permission problem with cvsroot. it was not group writable so many features were not working.
    • Software Overview

    To Do List:


    1. Threads investigation. Does it work? Mail the GROMACS List.
    2. Send mail to MCURCSM 2004 if I want to try to go. I think I might pass, we can talk about it on Monday.
    3. Port GROMACS to:

      • MPICH
      • MP_LITE
      • PVM

    4. Modify DB schema and schedulerto contain [success|fail], Where did it fail, how did it fail.

    Posted by hursejo at 05:20 PM | Comments (0)

    May 21, 2004

    Meeting Notes - May 20, 2004

    CCG Meeting - cp, jh, jm, db

    Software overview - This needs to be updated so that it can serve as a resource for the new folks. JoshH - LAM-MPI, CVS. JoshM - FFTW, GROMACS. Charlie - C3, PVM.

    Plumbing - The new drive, cable, adapter, etc. for /cluster are in. Charlie will work on installing it Friday morning. Dawit is working on building a new image for athena, ultimately athena will be our sand-box. JoshM will be working with him on this in preparation for the new images that need to be built for bazaar and cairo in the next couple of weeks.

    Support for threads (-nt) in GROMACS - No news from JoshH yet. If anything it may work without MPI enabled.

    WordPress rather than MT? - WordPress seems to get good reviews as an open source alternative to MT. JoshM and JoshH will check it out. If we are going to change we should do it soon while the body of stuff that needs to be migrated from MT is still modest in volume.

    Bi-Weekly MT updates - Everyone should start (this week!) making bi-weekly entries in MT (by Sun at midnight and Wed at midnight). These would be summary of what you are working on, progress made, and challanges to be overcome. Use the General category with a title of "Update - Name - Date". All of us should read these updates before our group meetings on Mon and Thu.

    Communication document - Charlie is working on a document describing guidelines for using MT, email, WiKi, etc.

    Conference and paper tour - JoshH submitted our entry for LinuxFest 2004, no reply yet. We need to make a decision about where to submit the B and T GROMACS paper. Other deadlines are coming-up soon, more conversation next week.

    Displayer - We discussed the functionality and organization of the tool JoshM has built. He has fixed most all of the outstanding functionality problems and it's looking pretty good. JoshM will prototype a couple of different ways to address stored queries/descriptions and dynamic column display for Monday's meeting. Open enhancements and bugs: order of data points, using the same SQL to generate tables and graphs, consistant color/molecule mapping, and grid lines on graph option.

    Logistics - Work on-campus on the West end of the 2nd floor during the day on Mondays and Thursdays. Charlie will move back into his office sometime during early June.

    Reading list and the plan - Charlie is still working on these. Dawit has the first two AltiVec readings and is waiting for more. Charlie hasn't gotten back to John yet. JoshM is going to re-read Leach before handing it off.

    Next meeting
    FFTW/GROMACS/MPI diagram update, use GROMACS manual as source?
    Scheduler presentation and conversation, read JoshH's entry in preparation
    Modular F@C - make a plan

    Posted by charliep at 11:23 AM | Comments (0)

    May 17, 2004

    Meeting Notes - May 17, 2004

    CCG Meeting - cp, jh, jm, db

    Meeting times - We'll be gathering for regular meetings at 1p on Mondays and Thursdays. During most of the weeks that Charlie is out of town we'll need to move those to 7p.

    Working on campus - All of us are going to make it a point to work in D129 or nearby all day on Monday and Thursdays. The rest of the week people are free to work wearever they feel most comfortable.

    Modular F@C - Charlie will contact Vijay WRT which molecular dynamics packages we should be considering. Once we have a sense of the breadth we can examine each in turn to see if our overall model will work and if not what we can do to accomodate particular packages.

    Graphing tool - JoshM is making some progress, between now and Thursday he is going to finish the remaining details. On Thursday we'll step back and take a high level view of this tool and see how best to package the functionality. This includes a review of the data model (succ || fail, how fail, etc.)

    B and T GROMACS paper - We need to decide soon which journal(s) we are going to aim for and layout the plan.

    GROMACS and MPI research - On-hold until we have a better sense of where F@C is going WRT treating MD software packages as opaque blocks.

    Threads and GROMACS - JoshH reports that this may be wild-west ware. He documented known problems between threads and LAM-MPI so this may be a large hurdle to get over. He is going to post a message to gmx-developers to see what's up. If integrating threads and MPI is not possible than we may want to a) evaluate threads without MPI on a single dual CPU node and b) start our consideration of other (thread aware/compatible) message passing libraries.

    /cluster space - The new drive is here, on Thursday we'll look at pausing the current batch of tests and installing it. We should check it in b16 first to make sure it made the trip ok.

    Reading - Charlie gave two AltiVec articles to Dawit. JoshM has Leach. Charlie is going to keep working on a first cut of the reading list, stay tuned to /cluster/project/generic/doc.

    Plumbing - JoshM is going to start working on SystemImager with Dawit. We need to build new cairo, bazaar, c0, and b0 images before the cluster move in June.

    Items for our next meeting:
      Conference calendar
      Review graphing/test result tool - packaging and data model
      Modular F@C
      Weekly email updates
    
    Posted by charliep at 04:54 PM | Comments (1)

    May 13, 2004

    Meeting Notes - May 13, 2004

    CCG Meeting - cp, jh, jm

    (Some of this is from Tuesday's meeting.)

    Apple Grant - We're going to apply to this www.apple.com/science/clusteraward program. Apple is going to donate some 5 node G5 Bioinformatics clusters. Erik Lindahl is on the panel. Probably propose a modular F@C. Charlie will work on this during late May (the deadline is June 13). All of us need to read Apple's fine print and make sure we will be comfortable with their terms.

    Molecule Errors - Proteasome is failing about 2/3 of the time between 1 and 34 processes on Cairo. Charlie is tracking the various failure modes and putting together a message to Erik and Vijay. Bazaar annex runs for this molecule will start sometime after the power outage on Saturday.

    Molecule Organization - JoshH moved copies of all the source files for the molecules we are currently using to /cluster/project/molecules. Charlie will find descriptive text, results of 1-34 process runs, and any notes and organize them with the molecule source files.

    For the next batch of simulations (on Cairo) we'll run them for 2 picoseconds rather than 10 picoseconds as previously. While this is less than Vijay and Michael thought was needed (5 picoseconds) for accurate prediction of overall simulation rate, to-date 2 picoseconds has yielded reasonably accurate predictions for us. Shortening the duration of the runs greatly increases the breadth and depth of testing we can do. JoshH will be starting this next batch after the power outage.

    Charlie is keeping a list of items that need to be addressed when we rebuild the Cairo and bazaar images this summer.

    Graphing Tool - It's looking better but there are still a couple of key items needed. Graphing data sets with an unequal number of elements and a couple of other items from JoshH's message. Set queries: n-n-n, 2n-n-n, 4n-2n-n, 6n-2n-n, 8n-2n-n; cairo, bazaar, checkboxes for each molecule. cgi-bin should be in CVS. JoshM will work on this.

    mdrun Option -nt (nubile Tanzainians) - Does this work with MPI? Is it a more or less effecient way to map execution contexts to processors than MPI's processes? JoshH will look into this.

    LinuxFest Submission - JoshH sent us a first cut, JoshM and CharlieP need to review and write bios. This must be done by Friday so that JoshH and submit it Saturday.

    Charlie is ordering a new drive to use as /cluster on hopper.

    Next meeting: Monday May 17, 2004 @ 12:00EST


    • Code review
    • Modular MD approach
    • Test Result Tool
    • Where to publish B and T GROMACS paper?
    • Conference calendar checkin
    • Report from Charlie from Pande Lab visits

    Posted by charliep at 07:19 PM | Comments (0)

    May 05, 2004

    GROMACS 3.1.x and 3.2.x AltiVec Support

    In order for GROMACS (3.1.x and 3.2.x) to build and use AltiVec instructions on PowerPC chips running Yellow Dog Linux/gcc 3.3.2 there are two files in the distribution which need a header file added to them.

    In configure "#include <altivec.h>" should be added before main() in the generated C code in the AltiVec support test section. You can find this by searching for "supports altivec".

    In include/ppc_altivec.h "#include <altivec.h>" should be added before the first function definition.

    Posted by charliep at 06:28 PM | Comments (0)

    May 04, 2004

    Reconsile Database with Directory

    So here is a short tutorial on how I currently do a b-and-t-g Database reconsile with the disk. First be farmiliar with two items:


    • b-and-t-g database
    • /cluster/project/b-and-t-g/results/

    Open the Database and do a select in the form:
    SELECT molecule,label,processes,cpus,nodes,finish_time from result where cluster_name = 'cairo' and molecule = 'villin' ORDER BY molecule, nodes, cpus, processes

    for each cluster [cairo | bazaar] and molecule [villin | cut | pme | dppc].
    In another shell you want to cd to /cluster/project/b-and-t-g/results/cairo/raw-data/.
    Each folder is titled as:
    MOLECULE-LABEL-PROCESSESonCPUS-NODES

    For example:
    villin-Gromacs-Optimal-3.2.0-37on20-10

    is molecule villin, label Gromacs-Optimal-3.2.0, 37 processes, 20 CPUs, 10 processes.

    You are looking in the finish_time field for the rows that you can clean. You are looking for rows that have a null finish_time or a finish_time of 1900-01-01 00:00:00 the latter indicates that the scheduler found an error and reported it, the former means that the run is stillin progress or never finished. As you are looking through the database you need to know the runs that are currently running because they will have nothing in their finish_time field, and if you delete this row then the scheduler will fail. If the run never finished then it can be removed from the database.
    When you delete a row from the database delete the assocated directory in /cluster/project/b-and-t-g/results/CLUSTER/raw-data/
    Once you weed out all of the runs with a finish_time of null or 1900-01-01 00:00:00 then you are finished.

    Posted by hursejo at 11:19 PM | Comments (0)

    May 03, 2004

    Meeting Notes - May 3, 2004

    May 3, 2004 CCG Meeting - cp, jh, jm

    Charlie still working on reading lists for 1/x, sqrt(x), and F@C.

    Graphing tool. JoshM making some progress but when the same script is run under /cluster/cgi-bin it fails but works fine when in ~mccoyjo/cgi-bin it's fine. JoshH will take a look with him. If by Tuesday there is no solution send email to Charlie so he can look at it too. Presentation on Thursday to Vijay's group.

    Specifications for the new graphing tool. X axis parallel structure, Y axis ps real/day, one color per molecule, data point at each parallel architecture. One graph per parallel structure: one process per cpu, two processes per cpu, one process per node, etc. JoshH will flesh this out.

    Molecule runs are moving along. Bazaar still chugging, cairo idle now. JoshH will bring the chart up-to-date and split it into bazaar and cairo. JoshH will talk to Seth about recovering the cairo nodes.

    mdrun/MPI, JoshM is making some progress. He will write-up a description and post it to MT.

    This is a good time for Charlie to get questions answered by Vijay, et. al. Send them along post haste.

    Villin/Urea molecule failure. Some progress, no real news yet. Charlie.

    Implement catch the child failure, recover, and re-start with one less process logic and code in v0.5 of F-at-C. Making progress, figured-out how to keep the group from dying (MPI flag). Typedef'd functions and error handler, Charlie will look at this. JoshH.

    Grant opportunity, Sun for hardware and money to port GROMACS to SPARC. Charlie.

    Posted by charliep at 11:10 PM | Comments (1)

    April 30, 2004

    Meeting Notes - Apr 29, 2004

    CCG Meeting - cp, jh, jm, js
    New power outage schedule, May 15-16 and June 12.

    Bazaar annex is fully functional. Joshes both running code on it.
    See MT for JoshH's MPI config file.

    No graphing tool yet, maybe this weekend. JoshM.

    Some progress on mdrun/MPI, starting parallel runs. JoshM.

    All of us (particularly JM and CP) need to be more regular about using
    MT.

    The new testing users are setup on the cluster, f-at-c and b-and-t-g.
    Passwords are the same as the switches.

    Fix rid-lookup.php and install in cgi-bin, link to it and the new tool
    at c.e.e/html/resources (all of those files are in CVS) and in the
    Links section in MT. JoshM.

    Villin/Urea molecule failure. Charlie.

    Implement catch the child failure, recover, and re-start with one less
    process logic and code in v0.5 of F-at-C. JoshH.

    We'll need to setup Bugzilla before too long.

    Reading list for 1/x, sqrt(x), and vector processing. Charlie.

    Posted by charliep at 06:18 AM | Comments (0)