2015/10/18

Server Administration, LARTC, and All The Pain Money Can Buy - Fast Networks, Backups and Lots of Disks

Hi Interwebs, long time no post.  But I'm feeling creative today.  Since my last post I've changed companies to take a job as a software engineer across the country and I now live in the Twin Cities.  A lot has changed, but my home server rack is essentially the same as a few years ago although it has had a couple of upgrades.

Right now I'm sitting around, surrounded by tools and parts, waiting for drives to finish zeroing out so I can make a second ZFS pool.  I'll get back to that in a minute.  First I should break down how I currently have my server setup.

My VM server is being rebuilt at the moment, so I've offloaded running virtual machines to my desktops; they're beefy enough to handle the job for the moment.  That leaves my core switch, router and storage server in the rack.  You can skip the break down for the tl;dr section after the list.

  • Switch
    • TPLink TLSG2424 - it's what you'd expect from a ~$200 switch that has higher end abilities, it's kind of junky and doesn't support the higher end features great, but it mostly works most of the time.  Has VLANs, LAGG, SSH, QOS, SNMP, etc.  A couple features are buggy and the documentation is terrible, but once again, what did you expect for ~$200?
    • Connects to the core switch to the core router and servers via patch panel.

  • Router
    • PFSense box (dual core Celeron w/ 2 GB RAM) with 7 NICs, 6 of which have drivers for BSD, 5 of which work flawlessly.  One interface on the PCIE bus is a backup-backup wireless AP, the driver tends to wedge from time to time and the AP goes MIA.  Not a huge deal, but I wish it worked without issue.  The on board NIC doesn't work at all which isn't a huge deal, but it forces me to run the WAN link over the PCI bus.
    • The NICs are spread over the buses (PCI and PCIE) such that one NIC on each bus makes up a bond - 1 PCI and 1 PCIE NIC to spread the traffic and interrupts.  The WAN is on the PCI bus (lower bandwidth requirements - but I'll get to bus flooding later), and the WLAN is on the PCIE bus.  1 bonded pair goes to the LAN subnet/VLAN and another bonded pair go to the storage network/VLAN.
    • Uses LACP to switch for bonding
  • Storage Server
    • Slackware Linux box (AMD 6 core, 32 GB RAM) with hand compiled kernel + ZFS on Linux patches with 4 NICs and 10 disks.
    • NICs form 2 bonded interfaces from a PCI/PCIE NIC pair each to spread the bus load and are bonded to the "storageGateway" subnet on the router, via the switch, using adaptive load balancing (ALB) on one bond and LACP on the other.
    • Disks are spread over the PCIE and SATA buses into 4 ZFS mirrored vdevs; 3 vdevs have 1 internal disk and 1 external hot swappable disk where the internal drives use a PCIE controller and the external drives are wired to the SATA bus and primary onboard disk controller. 1 vdev has both disks in external hot swappable sleds for upgrading the vdev without opening the rig - this mirror uses the second disk controller on the motherboard since the primary onboard controller can only handle 4 SATA channels.  1 disk is the OS drive and the other is an SSD dedicated to read caching and write buffering.
Whew!  That's a lot to take in.  I know because I designed it and I have a bit of trouble remembering it all off the top of my head.  The basic design principle is that all loads should be spread over available buses and disk controllers for performance reasons.  One of the mirrors in the ZFS pool should have both drives externally accessible so the storage server doesn't have to be opened for an emergency storage upgrade.  Here's the logical and physical breakdown of the wiring (my server closet is a mess, I'm in the middle of bundling the wires today, also sorry that none of the pictures can be lined up side by side, Google decided to make the sizes exactly half of the width of the page and then padded them for some reason... after a good 20 minutes of trying to make this look nice I gave up - you're lucky I left any pictures at all.) :

Switch logical view : Rows are LAG #, VLAN #, Port # (colors are bond association on other pictures)

Router NICs by bus : "Wire #" is the physical wire ID per box going to the patch panel and "Port" is the terminating port on the switch (row 3 of the switch pic above, 1 is bottom left and 2 is above it, etc.).
Router logical view

Storage Server logical view

Big Picture logical view



Yes, my server closet is literally a closet.