Changed tapes on Friday, and again on Sunday. Weekend backups didn’t quite work properly.
Something about the weekend backups didn’t pan out. I re-ran the manual backups for paco and roj this afternoon on a fresh set of tapes and used just over one tape. Pax’s backups mostly failed, and I’m not sure what to make of that. Compression seems to be working, though, since I can get 120 GB on a tape from Banner data. Need to check on pax’s backup history.
Yesterday I did more tweaking to the alumni search CGI for Bryan.
Added the ability to search by first, middle, and last names, class year, and e-mail address. Added URL field to database. Made HTML output use the Earlham stylesheet.
See previous post
Shared memory and SYSV IPC settings need to be tweaked in /etc/system on Solaris, and buffer parameters need to be increased for NetBackup for performance tuning.
Also, before I forget, gigabit ethernet is absolutely required here.
See this document on Solaris kernel tuning for settings to /etc/system to make NetBackup run better.
This document on network buffer size and data buffer size and number has good information as well. The NET_BUFFER_SZ file needs to be present on all clients.
Playing around this morning looking at various backup thngs which run off of the rsync method.
Spurred by a posting in sage-members, I was looking at
The shaper and Garibaldi seemed to make the main campus connection go this evening.
Found some bug reports on FreeBSD that indicate that the quota system doesn’t like files that are owned by unknown users.
This bug indicates, among other things, that quotacheck doesn’t like files that are owned by unknown users. I’ve done some searching on PAX, and it looks like I deleted the accounts of a bunch of summer 2002 students, but never deleted their home directories. I’m currently in the process of deleting these dangling files. We’ll try quotas later this week, perhaps, and see whether it works or not.
Discovered at about 5 AM on Saturday morning that NetBackup kills NFS on HEIWA when it’s reading the deep directory tree of Wusage reports.
Somehow, deep into the reports tree for Wusage, NetBackup’s tar program interacts badly with NFS and makes NFS from HEIWA to PAX die, thus killing the machine. It takes about 1.5 hours to get to that point, making a backup that starts at midnight kill HEIWA at about 1:30. I got in at 2:30 and manually kicked off another backup to watch it. at about 4:30 it died and I got to see where.
Have disabled backups on HEIWA until I figure out what to do about this.
Wrote a short little CGI perl script for Bryan and the alumni e-mail directory.
I’m not sure why there is such a disconnect between what I’m trying to suggest and what Bryan seems to be understanding. So I wrote a 30 minute perl script to demonstrate jsut exactly what I’m talking about. The script merely searches for case-insensitive regular expression matches in a CSV file of first name, middle name, last name, class year, and e-mail address entries. It does a little bit of formatting for the output.
Catalog backups on NetBackup are going to disk and then getting written to tape with a semi-manual dump command.
NetBackup was having problems with the catalog backup directly to tape, so I set it to back up to disk instead. Then I modified the dbbackup_notify script to run a little shell script that takes a snapshot of the /home filesystem and dumps it to tape. This has the benefit of also backing up all of the NetBackup installation as well.
The dump script uses fssnap to take the snapshot, and then uses ufsdump to perform the dump to tape.
NetBackup is much more finicky about tape drive errors and cleaning requests than AMANDA ever was. It actually pays attention to them and suspends the jobs if they happen.
Currently working on finding a solution to this problem - fairly often when changing tapes the cleaning light will come on, suspending the current job. There are some scripts that will check for downed drive states and reset them, or up them as appropriate. I’ll be modifiying one of these to do that. We’ll see if that eases the problem.
Also trying to get automatic tape cleaning going, but that may not be necessary and may be tricky.
Use vmoprcmd to get status of tape drives and set them up, down, etc. Use tpclean to get info on drive cleanings and to initiate a cleaning.
Working a revamp of the policies in NetBackup.
Don’t be afraid of policies. There will be at most three policies for every host: business, user, and system (data types). The schedules for full and incrementals on each will be the same, but the differences will be in the file lists. This is probably the easiest way to separate out the file lists for each host in a scalable manner. It leads to a proliferation of policies, but I think that’s manageable, at least within the framework here. I’ll post a description of the policies in more detail tomorrow.
Tonight I’ve turned NetBackup off, so it won’t attempt to back up anything.
Update, 11/11/03:
As promised, policy descriptions. There are three templates, which are applicable to almost any host. Not every host will have all three of the policies, depending on its mix of user data, business data, and whether the OS is easily re-installable (Jumpstart).
Web sites with Veritas NetBackup information
Getting started with NetBackup.
Yesterday we discovered that the AMANDA dumps on ROJ were really not working well at all, so I went on a crash install of Veritas on the new backup server. I’ve put clients on PAX, ASHTI, and SITH so far, and I did a trial backup of /home/ldap on ASHTI last night. Nothing there at that time, so it didn’t prove much. I set the policies to have /home/db and /home/webdb on SITH backed up starting at 8 this morning, and it kicked off and finished with what looks like a happy ending. Also in the mix were most of the /home partitions on PAX, and again /home/ldap on ASHTI. PAX is currently working on /home/groups, after having done /home/classes. I won’t really know how it all goes until the whole backup finishes or fails, and hopefully that’ll work ok.
Things to note so far: