November 17, 2005
[RAHU] Primary Hard Drive Replaced

The first hard drive (c1t0d0) failed a few days ago, and Sun shipped out a replacement today. I have replaced the drives and the mirrors are currently resyncing.

The replacement procedure is as follows:

  • Install new drive in spare drive slot, then run devfsadm to make Solaris recognize the new drive.
  • Run dd if=/dev/rdsk/c#t#d#s2 of=/dev/rdsk/c#t#d#s2 and cancel shortly thereafter to copy the partition table to the new drive (input is remaining good drive, output is new drive).
  • Remove the new drive from the spare drive slot.
  • For each mirror on the mirror set, run metadetach -f MIRROR SUBMIRROR followed by metaclear SUBMIRROR. This removes the failed drive from the mirror set.
  • If there are any state database replicas on the failed drive, run metadb -d c#t#d#s# to remove them.
  • Remove the failed drive.
  • Insert the new drive.
  • Run devfsadm to rebuild the devices.
  • Run metadevadm -u c#t#d# to tell SVM about the new drive information.
  • Recreate the state database replicas on the new drive with metadb -a -c # c#t#d#s#
  • For each mirror, run metainit SUBMIRROR 1 1 c#t#d#s# for the new drive, followed by metattach MIRROR SUBMIRROR to attach the mirror and begin resyncing.
Posted by littejo at 04:04 PM, updated 08:02 PM November 17, 2005
[EYEWI] DLT Drive Replaced

The standalone DLT4000 (Dell PowerVault 110T) tape drive that we used for backing up the catalog was exhibiting problems yesterday and EYEWI was refusing to recognize its presence on the SCSI chain. I replaced the drive with a DLT1 drive, reconfiguring NetBackup in the process (NetBackup doesn’t directly use the drive, but it does know about it).

I also removed the Overland DLT library from the system and powered it off, again telling NetBackup about the changes. I’ve removed most of the DLT media from the media manager except for a few tapes that seem to be “assigned” and can’t be deleted.

Posted by littejo at 12:58 PM
November 10, 2005
[ROJ] CPU Error

CPU1 on ROJ experienced an error last night which caused the system to record a crash dump and reboot. I opened a support case with Sun, and this is apparently a problem with the UltraSPARC II chip that is acceptable if it happens once, but further occurrences would warrant a replacement of the CPU.

Sun has recorded this case in their Best Practices Database; the case number is 64806912. They have system logs for reference, indicating the affected CPU. If this happens again, we should reference this case number so that support can determine whether a CPU replacement is necessary.

Posted by littejo at 03:24 PM
November 04, 2005
[RAHU] Update: Groups Home

The other two 500GB Xserve RAID volumes were added together to make another 1TB Veritas volume for groups directory storage; we are in the process of migrating this data from PAX to RAHU.

Posted by littejo at 12:50 PM
[PAX] Update: Migrating to RAHU

Most data on PAX is slowly migrating to RAHU. Home directories were completely migrated just prior to fall semester, 2004. We are in the process of migrating groups directories to RAHU as well.

Posted by littejo at 12:49 PM
[KE] Update: Mailman, Baleen

KE has undergone two major changes: migration of Mailman and introduction of Baleen.

  • Mailman has moved to BARIS (see notes there), along with an upgrade to Mailman.
  • Baleen, a mail scanning system based on the concept of an e-mail gateway or firewall, has been introduced (running on TAIKA) as a replacement for SpamAssassin running on KE, offloading most of the heavy processing that KE had been experiencing.
Posted by littejo at 12:48 PM
[HETEP] Update: Production

There has been no significant change to Sun ONE Calendar, however we are now using it in production and RCM has long since been turned off. Apple iCal can also access Sun ONE using the icald service running on HEIWA.

Posted by littejo at 12:46 PM
[HEDD] Update: New Versions of Moodle

HEDD continues as it always has. We have upgraded Moodle periodically.

Posted by littejo at 12:45 PM
[EYEWI] Update: Tape Library

The primary change to EYEWI was the introduction of a SpectraLogic T120 tape library in fall, 2004.

The T120 (Superfrog) is SCSI-attached and contains two LTO2 tape drives. It currently has the default of 30 slots available. The Overland DLT library is still attached but unused. The standalone DLT drive is still the local backup drive. We are in the process of erasing all previous DLT tapes for resale.

Posted by littejo at 12:44 PM
[BARIS] Update: Mailman, LDAP, General

BARIS has had a number of alterations to it, although its role is still entirely e-mail related.

  • In February, 2005, we migrated Mailman mailing lists to BARIS, upgrading the version of Mailman in the process and introducing searchable archives for all mailing lists.
  • At the same time we migrated Mailman, we added LDAP mail routing functionality, moving all mailing list aliases into LDAP, where KE reads them every hour and BARIS and TAIKA access them immediately. New and deleted Mailman lists are also updated every hour.
  • BARIS is currently running SquirrelMail version 1.4.4.
Posted by littejo at 12:42 PM
[SITH] Update: Mail Testing, LDAP

SITH migrated through a variety of roles before coming to its current life as a member of the master LDAP cluster.

It started out life as a last-minute replacement for a SunFire V210 that we had ordered for WebDB -- the V210s were back-ordered due to a hardware bug that Sun was correcting, and our supplier got us the V120 instead. It served WebDB for a year or so.

After serving WebDB, I recommissioned it to a test system. I tested both Cyrus IMAP and Sun ONE Messaging Server on it as part of a mail system migration (still in progress).

In early fall, 2005, I rebuilt it to be a nearly identical clone to ASHTI for the purpose of serving in a dual master LDAP cluster. The differences between the two machines are, at this point, their names, their disk sizes (SITH has 36G), and their primary network interfaces (SITH has a PCI gigabit card).

Posted by littejo at 12:41 PM
[ASHTI] Update: Disk Failure, Dual Master

ASHTI performed relatively well throughout the 2003-2004 and 2004-2005 years. It suffered a hard disk failure in spring 2005. Since it was not under service plan, I opted to repurpose a pair of old (but compatible) 18G drives from KE to bring it back into service. While it was down, SITH acted as the primary LDAP server. In conjunction with bringing ASHTI back into service, I upgraded to the latest hotfix for Sun ONE Directory Server.

Early fall semester, 2005, I rebuilt SITH to be a mirror LDAP server using the same Jumpstart image as ASHTI. Both are currently online in dual-master mode.

Posted by littejo at 12:40 PM
[MIR] Update: MIR Decommissioned

MIR was decommissioned in spring, 2005 after 4 years of varied service.

MIR’s services were migrated to new machines as follows:

  • Print and file (Public, Labs, and Temp shares) serving to MINA, with the introduction of LPKiosk.
  • DNS and NTP to EIRENE.
Posted by littejo at 12:38 PM
[General] Return

It's time to get this log back on track. What follows is some updates on a number of the servers. See also the TWiki reference for each server on its archive page. TWiki should contain the latest general description of the server, while this log will contain more historical markers of significant changes to the server.

Posted by littejo at 12:37 PM