I tried replacing the temporary disk on ASHTI this morning with one of the old 18 G drives from KE. I think in the end it’ll work fine, but in the short term I toasted some filesystems and essentially killed ASHTI.
I managed to install the Sun ONE directory server on SITH and copy over all the data properly, so we at least have LDAP service restored (updating the dns records for directory.earlham.edu). SITH is now part of the regular NetBackup rotation again.
When I return I’ll salvage anything else off of ASHTI that I need and then jumpstart it. I’m going to try making it and SITH into a dual-master LDAP cluster so that hardware failures don’t take us out quite this way again.
Attached to this post is a working M4 file for SITH’s sendmail installation.
Sanitized, here is the M4 config that is working on SITH. This should do it all.
I got the aliases problem sussed out. It wasn’t a Sendmail or Cyrus problem but an LDAP problem.
Sendmail was happily looking up alias@domain, rewriting it to alias, and not getting any farther — and then Cyrus said it didn’t know anything about alias.
The problem was that it was looking up alias@domain in LDAP, saying, “yup, that’s local, so it’s just alias,” and then trying to look up alias in LDAP and failing (since only alias@domain is listed as a mail attribute). The fix is to put alias as another mail attribute, and it looks it up just fine.
Previous alias LDAP entry:
dn: cn=ALIAS, ou=Aliases, dc=earlham, dc=edu objectClass: top objectClass: groupOfUniqueNames objectClass: mailrecipient objectClass: mailGroup cn: ALIAS mail: ALIAS@earlham.edu mailHost: sith.earlham.edu mgrpRFC822MailMember: recipient
The new one adds one mail attribute:
dn: cn=ALIAS, ou=Aliases, dc=earlham, dc=edu objectClass: top objectClass: groupOfUniqueNames objectClass: mailrecipient objectClass: mailGroup cn: ALIAS mail: ALIAS@earlham.edu mail: ALIAS mailHost: sith.earlham.edu mgrpRFC822MailMember: recipient
This lets sendmail search on the LHS part of the alias@domain and find a match.
Reference this entry for how I thought it should have been done last fall (I was closer than I ended up doing it).
Apparently Samba 3 still relies on unix-side password and group files for group membership information, not just LDAP.
In particular, the Domain Admins group needs to have its members in /etc/group, otherwise those folks don’t get admin rights. Aaron sussed this out. See the following:
The test setup on SITH seems to be almost entirely working.
The following all work properly:
I don’t have alias expansion working yet. If I point an alias in LDAP to SITH, sendmail and Cyrus don’t expand the alias once we get there. Will have to look at that more.
I figured out the rulesets for the authentication force a while back, but they weren’t working on Solaris. I finally traced the problem to incorrect Berkeley DB version for the access map.
I ended up rebuilding Sendmail with the proper include flags to access the new Berkeley DB install for everything (makemap hadn’t compiled earlier), reinstalled the new Sendmail (and makemap), remade access with the new makemap, and it all seems to work now.
I’m currently writing a minimal Makefile for Sendmail control like *BSD has in /etc/mail, and we should be good.
Just ran a small message through a few thousand recipients on the Cyrus test box.
The test box is a SunFire V120. The number of recipients was 5132. The total delivery took about 3 minutes, during which the load average hit 17 (and sendmail refused a few connections). The original message was 1 K, and a single delivered message was about 2.5 K. Disk space used after delivery was approximately 133 K.
I think my V240 will be plenty beefy, particularly with the Xserve RAID.
EYEWI’s /home partition was full again this weekend, causing backups to fail. I eeked out another 12 GB of spare space by doing some partition shuffling and moved NetBackup to the new partition.
When I installed the two extra drives for the RAID 5 data partition, I had to use partitions the same size as the two partitions on the first two drives. That meant I had several slices left over, as well as about 12 GB on each drive. I rearranged the slices and made slice 0 a 12 GB slice on each of those drives. I then mirrored them and formatted them. Then I copied the NetBackup installation and catalog backup area onto the new partition and let things go. This should hold NetBackup for a while, although we may want to replace those drives with 72 or 143 GB drives at some point (or get an actual hardware RAID LUN for the data).
Looks like /usr/local was indeed toast, but I recreated the filesystem, reinstalled the packages, and restarted some services, and it seems better now.
Forcibly umounting /usr/local, fscking the partition, and remounting still generated I/O errors on remount, so I simply newfs’d the partition. I generated a list of SMC and ECS packages installed into /usr/local, and reinstalled them from ROJ (and three from my home directory on ASHTI — Net-SNMP and its dependancies of OpenSSL and libgcc).
I had to recreate the Net-SNMP config file, which also meant changing the path to perl in /usr/local/bin/snmpconf. Whatever.
I restarted Net-SNMP using the script in /etc/rc3.d, and it was fine. I restarted /etc/rc2.d/S72inetsvc to restart inetd, and NetBackup is now fine.
I’m probably going to reboot the whole machine at some point just to make sure things are peachy, but for now it’s running well.
Now we just need to see if we can scare up an 18GB replacement hard drive. Sun doesn’t seem to want to sell me one — wouldn’t I much rather have 3g or 73 GB?
You’d think we had an air conditioning failure recently the way disks are melting down recently. ASHTI’s first disk died and I had to scramble to get a replacement in.
ASHTI was still running ok, but swap was unhappy and thus not allowing logins.
I took one of the hot spare 18G disks out of PACO’s A1000 unit, stuck it in SITH for partitioning, and then swapped it for the dead drive in ASHTI. I had the partitions slightly wrong, so I had to repartition (I’d swapped the 0 and 1 partitions), but it at least was running well enough to let me log in and do that.
Right now, the mirrors are still rebuilding. /usr/local has some errors and complains that it needs to be fsck’d. This may be true, or it may be fallout from the incorrect initial partition. I’ll wait until we’re done rebuilding, and then see if the errors are still there. If so, I’ll have to see about recovery there.
In the meantime, I’ve fired off a request for a new 18G drive.
Congratulations, class of 2005. A lot has happened in the past four years.
I remember the first big batch of accounts I created — the class of ‘05. Looking for collisions, checking them by hand.
This class saw the expansion of the general server pool from one to eleven, plus or minus. That first server is still in service, although it’s probably headed for retirement soon. They’ve gone from a single server which crashed frequently to a pool that normally gets restarted only when we have power shutdowns.
This class saw the shift from the single general server running Linux to the elimination of Linux from the server pool (in favor of FreeBSD and Solaris).
This class has always seen servers with their names taken from various languages’ words for “peace”.
This class has always had a Windows 2000 domain, run by Samba acting as the domain controller.
This class saw the replacement of TWIG with SquirrelMail, but probably barely remembers the former. They saw the introduction of e-mail quotas, and later home directory quotas.
The disk space used by this class’s home directories has grown by an order of magnitude since the end of their first semester. The number of e-mail messages they’ve sent and received has increased by a factor of five. They have always had their mail scanned for malicious attachments.
This class has gone from 10mbps shared hubs to 100mbps switched networks in their dorm rooms. When they were able to live in campus houses, wireless was there for them.
I’m sure there’s more, but that gives a taste of what’s happened in four years.
I made some improvements to the previous local rule sets that have it use the access map and allow for classed IP subnets.
I changed the noauth_map ruleset so that it goes as follows:
Snoauth_map
R$* $: $&{client_addr}
R$+ $: $>A <$1> <?> <+ Connect> <$1>
R<RELAY> $* $@ RELAY relayable client IP address
R$* $#error $@ 5.7.0 $: "530 Local authentication required."
This takes the client IP address, puts it through the A ruleset (which looks for an IP match — or classed subnet — in the access database and returns the RHS of that match). If the result of the A access lookup is RELAY, then we allow an unauthenticated connection, otherwise we fail to the standard authentication required error.
I installed the full ruleset on both BARIS and KE this evening, and we’ll see how things progress tomorrow. This should force all incoming general mail through TAIKA — or make the spammers stop using KE and BARIS.
Ah, here we go. Turns out that the auth_type macro doesn’t quite respond to the R+ rule like I thought it did.
I had been trying to do it with just one rule that checked to see whether auth_type had been set. I’ve had to expand that to two rules to check to see whether auth_type is either LOGIN or PLAIN (the two auth types I allow). I’m sure there’s a slicker way to do this, but I’m still a little green at raw cf hacking.
So, for background:
The 64 dollar question is: “How do I keep port 25 open to the world, but require SMTP AUTH for anything outside of a particular network or host list?”
We want KE (or future Cyrus box, or BARIS, for that matter), to be available to clients on and off campus for use — off campus use must be authenticated. But we want to have this on port 25, since there are a number of broken clients out there that can’t deal with SMTP on any other port (or make it very difficult — Qualcomm, are you listening?). But I don’t want to have to set up TAIKA to use authentication when shoveling in the daily mail load, since there’s no good reason to require auth, and doing so would put a small but unnecessary strain on the auth services.
So ideally, we want Sendmail to say, “Ah, incoming from TAIKA — no need for authentication” and “Oh, incoming from 192.168.2.4 — better require auth.” Both on port 25.
It turns out that mucking with the LOCAL_RULESETS can get this accomplished. Here’s what I finally put in the mc file:
LOCAL_CONFIG
Knoauthmap hash -o /etc/mail/noauthmap
SLocal_check_mail
R$* $: $1 $| $>"local_auth_plain" $1
R$* $| $#$* $#$2
R$* $| $* $: $1 $| $>"local_auth_login" $1
R$* $| $#$* $#$2
R$* $| $* $: $1 $| $>"noauth_map" $1
R$* $| $#$* $#$2
Slocal_auth_plain
R$* $: $&{auth_type}
RPLAIN $# RELAY
Slocal_auth_login
R$* $: $&{auth_type}
RLOGIN $# RELAY
Snoauth_map
R$* $: $&{client_addr}
R$+ $: $(noauthmap $1 $)
RRELAY $# RELAY
R$* $#error $@ 5.7.0 $: "530 Authentication required"
Now put the IP addresses of those hosts that you want to allow to send without authentication in the /etc/mail/noauthmap file and don’t forget to makemap it:
10.0.5.3 RELAY
Then 10.0.5.3 can send without requiring authentication. Everything else gets the “authentication required” error — unless, of course, they’ve authenticated properly, in which case they’re allowed relay.
Working on hacking the Sendmail cf file to preferentially require authentication on port 25.
I want to require authentication on port 25 for all client addresses coming in to KE except TAIKA — thus allowing people off campus to continue relaying after authenticating but not requiring TAIKA/Baleen to authenticate.
I’m getting somewhere with local rule sets — I’ve got it looking up the client IP in a map and if it’s there allowing it to relay, but I don’t yet have it to the point of being able to bail out and allow relay before that if the client has authenticated. I clearly need some work on the first part of the rule set, but I’m not sure quite how to rehabilitate it yet.
It seems that the comment spam floods in MT are primarily caused by web crawlers downloading the comments that are there rather than attempts at posting new ones.
We had another brief meltdown from comment spam (happens every couple of days now), and I checked the Apache log file immediately afterward. Most of the mt-comments.cgi requests immediately before the incident were GET requests for specific comments, usually with a referrer off site.
I don’t think we’ve had any new comment spam since installing the Scode plugin.
So the trick now is to remove the comment spam from blogs where it’s gotten to. Nasty here, since most of these blogs are long dead class projects which could probably be safely deleted. But can’t do that, so I’ll have to just delete the comment spam.
I changed the MX for our domain to TAIKA and removed the other MXs this morning.
We had a bit of a bull ride after that as I tried to get the mail systems under control from the Monday morning onslaught of faculty list messages. The Baleen filter was mostly fine, although I had to rewrite the greylist configuration generator slightly — it wasn’t happy with the regex syntax I’d used for exceptions.
The main other problem was solved by making BARIS also do Sendmail LDAP routing. I had to tweak the default LDAP spac for BARIS to take out the v3 syntax flag and point the root at dc=earlham,dc=edu rather than into the Aliases OU. List mail works fine now and goes directly to KE from BARIS, bypassing TAIKA and Baleen as it should (and I will love this when we have Cyrus’ single instance message store).
I have a few testers in Baleen now, and everything is working well. Most of the spam seems to be stopped outright by greylisting and DNSBLs.
ROJ’s A1000 had a disk failure this morning.
It failed over to the hot spare, and I have a case open with Sun to get me a replacement part.