Cluster Admin Project

Description:
General Cluster Administration

January 01, 2005

New Image Plan

A couple of days ago I replied to JoshM's message about testing the new cairo image with questions about where to find what was installed, etc. I think I was headed down the wrong road and that we should instead focus on making it easy to add/change things in the image, re-image machines, and re-boot machines. More like the release early and often mantra.

  • 1) Get a basic image built. Done for cairo, to be done for bazaar.

  • 2) Check/update the list of 0th node items. Create the file /cluster/project/sna/0th-node.html, check documents in that directory and MT for items. Include how to rebuild with a new image and then apply changes.

  • 3) Check/update /cluster/project/sna/cluster-imaging.html, particularly the parts about modifying an existing image and forcing an update on a one or a couple of nodes and then forcing an update on the whole cluster.

  • 4) Use F@C on a couple of nodes as a first-order test before calling an image ready to deploy on all the nodes in a cluster.

  • 5) Install the new images on all the nodes (including the 0th nodes) and we'll start using them and fix what's broken. We should make sure to keep all the documents referenced above updated as we go.

    Posted by charliep at 05:35 PM | Comments (0)
  • July 25, 2004

    WeatherDuck Calibration

    From JoshH on Saturday July 24th - Just some numbers from the Oregon Scientific (Left) and WeatherDuck (Right)
    Before Reboot:
    76.1 -> 79.2
    29 -> 49
    After Reboot:
    74.1 -> 76.8
    38 -> 41

    Posted by charliep at 08:16 AM | Comments (87)

    July 16, 2004

    IPMI and SOL in image

    I wanted to start a discussion on this so I placed this entry here. Please feel free to comment.

    We mentioned this set of software that we might be able to harness with our equipment. Mostly we need something to allow us to reboot a stalled/failed node over the Ethernet connection or some interface that we can access remotely. This way we don't have to travel to campus to reboot a compute node if it stops responding.

    Below are two links regarding IPMI and one users experience in setting it up under Debian:
    OpenIPMI
    Debian HowTo Document

    IPMI calls for a specialized linux kernel in many/all cases, but IPMI addresses an Intel standard which allows access to the BIOS even if the system has been shutdown -- as long as the power is flowing through the power supply.

    I know that I would find this useful. This is just one solution out there, but it is gaining popularity. There is also a peice of hardware that owuld let us do this as well, if I recall correctly.

    Thoughts?

    Posted by hursejo at 04:10 PM | Comments (212)

    June 03, 2004

    Linux Distributions for x86

    Distribution Latest Stable Kernel Notes
    Red Hat 9.0 2.4.26 Dead and and 2.6 kernel upgrade options look sketchy
    Fedora Core 2 2.6 Rumors of instability persist.
    Gentoo 2004.1 2.6 Updated often, support for many kernels (hurd, windows compatibility kernel).
    SUSE 9.1 2.6 Updated often and has an extremely large package library. It is becoming less "free-friendly".
    Debian 3.0 2.2.20 Can be upgraded to 2.6 kernel, but it certainly is not a Debian branded stable release

    As it looks now, SUSE and Gentoo look the most promising followed by Debian. Red Hat 9.0 seems like a really bad idea. Fedora could be a real wild card. It has good update times and the kernel we need but it is still considered flakey.

    Posted by mccoyjo at 11:46 AM | Comments (41)

    May 12, 2004

    Swap domain pointing

    Per our conversation on Monday I changed how our domain points so it is more intutive. So now http://cluster.earlham.edu/ will point to the html pages that we have created, and to reach the details click on the Browse Files link. The old link to http://cluster.earlham.edu/html still works just as it has before.
    In the process I had to move /cluster/icons to /cluster/html/icons and they are not in CVS. Should they be?
    I also noticed that cvsweb was not working so I fixed that as well.
    I fixed the links in the HEADER file as well.

    Posted by hursejo at 05:33 PM | Comments (0)

    May 04, 2004

    Imaging Cairo from OSX

    So here are some of my notes from imaging the Cairo nodes that were running OSX.


    1. Install a Minimal Yellow Dog Linux from a boot CD
    2. Copy a tar'ed version of SystemImager Client and the force-update.sh script in /root [located on all nodes] to the node you will to image.
    3. cd to systemimager-client directory
    4. ./installclient
    5. Don't run the prepareclient when asked
    6. cd ..
    7. ./force-update.sh
    8. Allow it to reboot
    9. To get ypbind to work you must set the NISDOMAIN=cairo.cluster.earlham.edu in /etc/sysconfig/network
    10. Then you must restart the network, and ypbind.
    11. Do a reboot and everything should be running.

    Posted by hursejo at 04:26 PM | Comments (0)

    April 28, 2004

    Bazaar Annex Nodes

    As I was waiting for some GROMACS code to finish compiling I took the liberty to finish imaging the new bazaar nodes (b16-20). They are all running now, waiting eagerly for work.
    Since you need a special lam-bhost.conf file to work with these nodes, I have created one that we can all use. Instructions on how to use it are at the top of the file.
    http://cluster.earlham.edu/home/joshh/src/lam-mpi/bazaar-annex.conf

    Posted by hursejo at 03:59 PM | Comments (0)

    April 12, 2004

    installation of cflow

    cflow is installed on bazaar in /cluster/bazaar/bin/cflow
    and on cairo in /cluster/cairo/bin/cflow

    The rpm's failed silently, so I had to grab the source. A PPC version seems to be hard to locate, but I am still looking. *Just after posting I found a diff for ppc and installed cflow on cairo*

    After looking up usage documents online, it seems that cflow is difficult to run on larger programs due to instability. I have had no successes after 40 minutes of playing with it on gromacs, but the given examples and smaller c programs work fine.

    documentation:
    http://www.opengroup.org/onlinepubs/007904975/utilities/cflow.html
    http://www.freealter.org/doc_distrib/cflow-2.0/#sect6

    Posted by mccoyjo at 10:54 AM | Comments (0)

    March 31, 2004

    GCC Upgrade on Hopper

    I was not able to fully upgrade gcc on hopper through the ports package. What it did was to install the binary with a different name so:

    $gcc --version
    2.95.4

    $ gcc33 --version
    gcc33 (GCC) 3.3.1 20030707 (prerelease) [FreeBSD]

    Is there a way to drop the 2.95.4 release of gcc and move the gcc33 binary name to something more common (e.g. gcc)?

    Posted by hursejo at 08:11 AM | Comments (1)