February 28, 2005

Gaussian Notes

Our installation of gaussian resides in /cluster/bazaar/software/g03.
A typical gaussian command line: nohup g03l md.com &
This runs gaussian with linda (g03 is the uniprocessor binary) using the input file md.com. Running g03l with nohup allows gaussian to continue running even if you log off the machine. The stdout and stderr will go to nohup.out. The g03l output will placed in md.log. If you execute the top command and watch gaussian run, it will call specific binaries to perform the molecular dynamics. These binaries are named lxxx.exe where xxx is a number. A list of the calculations each binary performs can be found here.

Gaussian relies on environment variables for most configuration options. Gaussian is packaged with environment setting scripts that should be sourced before running g03l (g03/bsd/g03.profile; it is handy to put this in your .profile). These are set with the export command in bash (export VARIABLE_NAME="value") and setenv in csh (setenv VARIABLE_NAME 'value'). Here are several environment variables that are handy:

  • GAUSS_SCRDIR="/tmp"
    • directory used for scratch files. Using /tmp is a good thing.
  • g03root="dir"
    • The base gaussian directory. Our base directory is /cluster/bazaar/software.
  • LD_LIBRARY_PATH="dir"
    • The path to linda shared libs (just points to $g03root in our setup). This is set in the g03.profile script provided by gaussian.
  • GAUSS_LFLAGS='-option1 "value1" -option2 "value2"'
    • '-nodelist 'b1 b2 b3 b4"' - Set the node list.
    • '-opt "Tsnet.Node.lindarsharg: ssh"' - Set linda to use ssh for authentication (the default is rsh).
    • '-mp 2' - The number of workers per node. [Note: this applies to all nodes in the nodelist. Since this method does not allow much flexibility, we may want to look for a better way to configure this.]

A suite of tests is included with gaussian (g03/tests/com). Running a set of these tests is a good way to test your gaussian environment. Keep in mind that the log file generated by gaussian is going to be placed in the same directory as the run configuration file (e.g. if you run "g03l $g03root/tests/com/test354.com" from your home directory, the test354.log file will be in $g03root/tests/com/).

Posted by mccoyjo at 01:05 PM | Comments (0)

February 25, 2005

Meeting Minutes - February 25th, 2005

Plumbing
Load scheduling with Lori
Documentation on Gaussian installation and use
Lori's sample job
hopper upgrade
Upgrade firmware on switches
Warewulf rather than ganglia (wulfstat or wulflogger) http://www.phy.duke.edu/~rgb/Beowulf/wulfware.php

Folding@Clusters
Non-nfs testing
a2 list

Posted by charliep at 08:23 PM | Comments (0)

February 18, 2005

Meeting Minutes - February 18th, 2005

non-nfs testing
  • temp user (home dir /tmp)
  • no release cut for first round of tests (take binaries from ~charliep/cvs-hopper/{cairo,bazaar}-a2rc/folding-at-clusters/release/bin/ )
  • /cluster/generic/bin/runfatc.pl (will be in cvs)
  • driver-fatc scripts
    • path
    • number of runs
    • min nodes
    • max nodes
    • molecule name
    • don't record in DB: yes = no record, no = record
  • non-nfs nodes; 3 tests 1 node, 3 tests 2 nodes, 3 tests 3 nodes,... walk up non-nfs config to more nodes
  • test set labels : start with a2 then move to rc, rc1, rc2, ...
  • put a2/rc test rows in results table
  • continually run tests on bazaar and cairo. Kill tests when we get a new rc. start tests with new rc when available.
  • dectect stalled runs by comparing time stamps to wall clock time
  • change to runfatc before we start tests: charlie will make a change (we're not harvesting the mass points; put attribute to store mass points in db table).
  • change dvc to use mass points in select as a select statement. could be a potential x-axis. we want to see how things scale by mass points instead of by noders (ps nodes/day vs mass points).
  • put calc-speedup.pl in /generic/bin/
  • use c1 and b1 as mother nodes when testing
folding@clusters
  • a2 and rc; check binaries to tell what version (rc, a1, ... etc) the we're using
Plumbing
  • make c15 smp again
  • /etc/rc.conf - use -h option when restarting portmap (look for portmap flags; bind to internal interface only)
Posted by charliep at 02:11 PM | Comments (0)

February 11, 2005

Meeting Minutes - February 11th, 2005

We're at SIAM this week finishing-up and presenting two posters, Folding@Clusters and Calculating 1/sqrt(x) for molecular dynamics
packages on commodity vector architectures. We'll post pictures,
etc. when we're done.

Posted by charliep at 05:31 PM | Comments (0)

February 04, 2005

Meeting Minutes - February 4th, 2005

  • General
    • Poster
    • Travel: 11a Thurdsay the plane leaves; leave Richmond 9ish. Return 5:30p on Tuesday.
    • Food and work Wednesday night at the ranch.
    • Stuff to take
      • projector
      • printer? (blow off if the resort has color printing)
      • gear bag (not whole thing; access point)
      • screen for projector
      • Poster tube (link in clustcomp archives)
      • Long jumper
  • Numerical Methods
    • converting GROMACS files to use with NAMD. Will document in cvs when done. see clustcomp m\ essage. John and Josh
    • Look at dihedrals (what is it and how does it manifest in our configuration files). do we \ have any? John
    • John's literature search is going well. More goodness to come.
    • Literature search - JohnS
      • inverse square root, molecular dynamics, vector MT Link
        Create a Literature Notes file in CVS (numerical-methods/doc/literature-search-notes.txt)
    • get bibtech from john's IEEE articles. John & Charlie
    • Charlie has info on 1/x^2 via Peter. He'll share soon.
  • Folding@Clusters
    • testing: good db of runs (platform, molecule, and node combinations). next is non-NFS test\ ing in /tmp/
    • always use .gro files for input so we don't have to deal with converting the pdb files
    • 3 flavors of failing. Let's start to think about how to catch these errors.
      • die to LINCS error
      • fail a bunch of times and max out restarts
      • activity stops, we use cpu, no new checkpoints being received, stalls in simulation
      • stalls during capability discovery
    • bug in capability discovery (netpipe); look at joshh's kludge
    • to catch failures, mother forks a process outside of the mpi world for the netpipe bandwid\ th test, mother has timeout feature on nannies and when triggered resets world.
    • list of molecules that consistantly fail. charlie
    • wrap mpi calls for value or kill to avoid race condition created by listen for message with\ certain flags.
    • DVC <-> results conversation. charlie & joshm
    • get ps/day information from mdrun output. get from log file on child 0
    • preserve log in a mother.conf or molecule.conf option
    • how many mass points in a system. can we tell how many mass points we have given our config\ fiels? wc -l on top file is a possible solution. John
    • package up a2 after siam for pande
    • 1-8 nodes on bazaar old image, 1-4 on new images testing done.
    • two types of testing a2 needs before release: Failure testing and Non-NFS testing.
    • F@C vs F@H - use b19 for x86 single cpu run
    • Failure and Recovery (with fault tolerance checking) - not done.
      • Look at folding-at-clusters/documentation/index.html
      • Killing lam on non n0 nodes.
    • SMP testing - Charlie will contact Henry Neeman and get the details for access to OSCER's SMP box to JoshH who will do the testing.
    • Review/Update protocol.txt - Revisit after SIAM. Make a phase diagram so we can figure-out how to test checkpointing, failure, and recovery completely.
    • Cleanup of files on non-NFS systems - Revisit after SIAM.
  • Plumbing
    • reimage all cluster nodes save 0th nodes.
    • cluster names in cexec
    • Gaussian working in the near future (next week)
    • hopper kernel parameters
    • move hdd out of bazaar golden client (b19) to another machine.
    • usb <> ps2 for cairo part of kvm
    • How to decide if /cluster/... vs locally on each node for a particular package? Let's discuss this in more detail after SIAM.
  • Other
    • Instrument NAMD and GROMACS to print out the x to find its range. Hard part is finding where to put the instrumentation in the code. Maybe GNUplot it.
  • Post SIAM
    • midwife - see jan 28th meeting minutes
    • Ganglia
    • JoshM will post a software list for us to review and update.
    • codeviz
    • cruise last 3-4 weeks of meetings for items lost in the pre-SIAM madness.
Posted by charliep at 09:50 AM | Comments (0)