Andrew Que Sites list Photos
Projects Contact

May 03, 2020

Happy Birthday Tara

   The reason I discovered my problems with ECC memory is because the Data-Dragon has been running very unstable.  It crashes after a few hours, sometimes resetting, sometimes just locking up.  There are no messages in the kernel log.  However, I sometimes got some message on screen before the lockup.  At first I looked at the power supply.  I am running 14 hard drives after all.  But even if all my drives were using 1.5 amps (which seems to be what the SATA spec allows for a drive) that only comes to 21 amps and my power supply has 35 amps of 12 VDC.
   I had just installed a second 16 GB stick of RAM bringing the total to 40 GB.  A quick memory test didn't show any problems with the RAM.  Nonetheless I removed the 8 GB stick just to see what happened.  So far the system hasn't had any problems.  We'll see if that holds.
   Temperatures were around 75°F/24°C when I set out on my ride this morning.  My goal was 30 miles and I started by cycling out to Indian Lake County Park.  Not sure if it reopened, or if it was never closed but there were a lot of people at the park as I cycled by.  When I arrived I noticed my GPS tracking software wasn't actually working.  I could calculate the number of miles I biked, but I'm pretty sure it was around 30 miles.  After Indian Lake, I headed west to Waunakee.  There was a good 10-15 MPH tailwind to push me along and this portion of the ride went quickly.  The last leg of the ride along county Q had a bit of a side wind, but I held about 15 MPH most of the way.
   The other day while doing a loop through Martinsville I stopped at this church.  There is a road that leads behind the church to a cemetery and where I was hoping I would get a view of the countryside.  Sadly there isn't much a view, but it was something new to explore.  Shortly after taking this picture it began to sprinkle.  The rain forecast for latter in the day had again arrived early.  My ride would be cut short for a second time.  Luckily, I had a fairly strong wind from the north west that made quick work for the trek back home.
Life Returns

Life Returns

   Enabled nightly backups of the Data-Dragon.  This is done through rsync to a Raspberry Pi 4 with 4x 10 TB USB hard drives in a RAID-5 configuration.  Initially I thought I would use wake-on-lan, but the Pis do not support it.  However, the drives spin down after a few minutes of being idle.  This is pretty much perfect as a an idle Raspberry Pi barely consumes any power.  So the drives spend most of the day powered down and wake in the evening for backups.  The Data-Dragon does this using a cron job.  I had debated on having the Pi run the cron job, but the fact it, if the Data-Dragon isn't online the cron job won't run.  I've had issues in the past with rsync removing all files because a mount failed.  This avoids that ever being a possibility.
   I noticed in the kernel logs the following message:
[   16.908469] EDAC amd64: Node 0: DRAM ECC disabled.
[   16.908473] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
                Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
                (Note that use of the override may cause unknown side effects.)
   This means the error-correcting RAM I have isn't actually being used.  My buddy Pluvius helped me look into this issue and we believe the ECC fetcher was removed in a recent BIOS update.  They had some clever but misleading verbiage.
Support for ECC Un-buffered DIMM 1Rx8/2Rx8 memory modules (operate in non-ECC mode)
   So you can buy ECC RAM and it will work—it just won't have any error correction.  This is more than irritating.  Pretty much all forms talking about running a ZFS file system strongly recommend using ECC RAM—which I did.  However it isn't actually working.
Horse has a Look at the Cyclist

Horse has a Look at the Cyclist

The Data-Dragon has undergone 14 drive tests, and most have tested all 13 of the 4 TB disks. A total of 648 TB (648,127,498,862,592 bytes actually) of data was written and verified as a result. Despite some setbacks caused by controllers, I have fairly high confidence these drives are functional.

So what caused the problems with the original Data-Dragon setup? Unknown. For now we will have to live with this answer. The Data-Dragon is setup with ZFS in RAIDz2, meaning it can survive a two drive failure. There is also a backup server whose drives are normally spun down. Thus there should be little chance of such a failure occurring again.

Playground Closed

Playground Closed

   I ordered another 16 GB of RAM for the Data-Dragon putting the total to 40 GB.  While 24 GB had seemed to be doing alright, I noticed the RAM usage was around 75% and I still don't have all the virtual machines running.  This should take care of that problem.
   Pictured is Parisi Park in Middleton with a sign stating the playground is closed.  All the parks I bike past have similar signs.
   Youtube at some point started displaying more irrelevant videos suggestions.  I didn't mind the suggestions when they were related to the topic I was searching for, but now it just seems the suggestions are purely ad driven.  To help filter some of the chaff I've added two filters that block youtube's "Recommend for you" and "Free with Ads". 
# Block youtube's "Recommended for you" video listings. span:has-text(/Recommended for you/)

# Block youtube's "Free with Ads" video listings. span:has-text(/Free with Ads/)
Neither of these categories have ever suggested anything I need to see and are rarely related to my search.  These filters work in uBlock Origin.
   After I finished the restore from the Backup-Dragon I needed to move the system to a new location.  Once it was moves I proceeded to destroy the LUKS header with a bad pipe command.  Now I had an inaccessable 30 TB RAID-5 array.  I have since learned how to backup the LUKS header, but that means my backup is, for all particle purposes, gone.
   I don't feel too bad about this.  The data has been restored so I won't be missing anything.  I just need to repeat the backup process.  That will take awhile, but isn't a problem.  It has been running for 4 days now and transferred 9.0 of 12.1 TB.  It is going slightly faster than the first round.  When Zach built the server, he used a USB hub.  The Raspberry Pi 4 has 4x USB ports, and this system has 4x 10 TB drives.  However, when he set things up he wanted to have a keyboard and used a USB hub so he could plug one in.  I don't need a keyboard and got rid of the hub.  Seems to have improved speeds some.
   My original plan had been to setup the backup server to actually do periodic backups.  Those scripts will have to wait a few more days.