Andrew Que Sites list Photos
Projects Contact
Main

April 30, 2014

Data recovery, part 2

Now that I have the offsets of my missing article data, I could begin the process of data extraction. The first thing I did was to pull all the missing data from the 137 GB file into a much smaller data file. The data was grouped to a region under 200 MB in size.

Using the search program, I was able to make the same list of offsets and dates with the smaller data file. While the search was running, I was trying to figure out how to parse MySQL MyISAM files. While I made some progress, there were some records I failed to parse correctly. While that method would have been my best bet, it wasn't going to work easily. So I decided to simply export article records as some kind of hack CSV file. Writing a utility to do this was pretty easy. I could use the list of offsets to start each row of the CSV file, and then dump all the data between the current offset and the next offset into the row. This was more than I needed, but gave me a CSV file to work with.

After that it was time to clean this file. I had to remove garbage at the end of each line that amounted to the control characters used by MySQL to delineate records, but after that turning the file into a real CSV file was just a little grunt work.

I ended up with a CSV file with 84 rows—articles from June 29 to September 21. Now I needed to insert these into the database. But before that, I wanted to display the articles and read through them. Storage on hard drives isn't always linear, and sometimes data is moved around. A quick PHP script gave me the ability to view the articles on screen, read the content and view the article's pictures. A few articles picture links were messed up, but I managed to figure out what picture was actually needed. I also managed to sort out all but one article's mismatched bodies. Only the article for August 30 had missing data—I simply could not locate the rest of the article body.

Once I had the articles cleaned up and ready to go, it was time to put them back into the database. Some quick SQL syntax and my sandbox server showed I had functionality. A quick copy to the main server and articles which have been offline for close to 4 years are again visible.

It was a lot of work, but I am happy with the results. I recovered 84 articles including one math article. Many of the articles are about my early flight lessons, and I was happy to read through them again.

Pictured is my data hard drive plugged into my main computer with a USB Pluvius gave to us. It is a great device, and I get a lot of use from it.

April 29, 2014

Data recovery, part 1

Back when the hard drive on the Micro-Dragon failed it took down the server. What's worse was that I had recently moved back to the Garage, but had not restored the scripts that did automatic backups of the web server. I don't recall what I did for data recovery, but I managed to restore all the data up to June 29, 2010. So there existed a gap between July and October.

I had tried putting the failed drive from the Micro-Dragon into my main machine to see if I could recover anything. However, the file system was too damaged to mount. I made a hard disk image of as much data as I could recover, and more or less left things at that. At the time I was taking both collage and flight classes, and living with the most roommates I had ever had. There was no free time to look into data recovery. That was 3 ½ years ago.

Today I started looking into the format of a MySQL database, and more specifically the formats of a MyISAM database. I found the DATETIME fields were stored as 64-bit big-endian integer value made by concatenating the date. For example, 2014-04-29 at 1:37:15pm would be stored as the number 20140429133715. The most significant byte happens to always be 0x12. In addition, there are several bytes that always appear around the date. This gave me a search pattern. I had already started a project like this in October of 2010. However, that search algorithm could just find a binary string. I needed to modify the search algorithm to do a search with a mask. That is, try and find a pattern, but ignore some bits of the search pattern. This would allow me to search for the time stamps that appear in a MySQL MyISAM database.

The failed hard drive was probably 250 GB, but I only managed to get an image of 137 GB. This image has been faithfully sitting on an unused hard drive which I found after some searching around. I'm really anal retentive about keeping data, and I never get rid of anything. Despite keeping the silliest things, there are times like this I am happy for my inability to use the delete button on my hard drives.

The search took several hours to run, but began to list offsets into the hard drive image the patterns were found and the time stamp of the record. As expected, most of the offsets were grouped close together. The missing article dates are among those found. I now have the information I need to begin extracting the missing data.

April 28, 2014

Postfix and rejecting email to specified users

I've run Postfix e-mail server for many years and one thing I do not like is that the default install allow e-mail to all system users—even the ones that are not actual users. Spammers love this fact, and happily send message after message to an inbox no one ever reads. It takes up space and serves no purpose, so I wanted to disable this.

I had done this once in the past with the rewrite rules, but those worked poorly. When the server hard drive died a few years ago I lost the config files for the e-mail server. I never bothered to setup the e-mail rewrite rules. Now that I've been getting back to housekeeping, I decided to have a look into what my options are.

It took a lot of messing around, but I discovered smtpd_recipient_restrictions with check_recipient_access. This turned out to be what I needed, although it was hard to figure out why things were not initially working. Several sites layout the steps, but this is basically the process:

First, create a file that will contain a list of addresses to by banned. I put mine in /etc/postfix/sender_checks. Add a line for each e-mail address to be blocked.

root@drque.net REJECT Root cares not.
webpages@drque.net REJECT

Second, turn this file into a database by running postmap hash:sender_checks. I have no idea what this command actually does, but it needs to run anytime sender_checks is modified. It will create sender_checks.db.

Edit /etc/postfix/main.cf.

Modify the line for (or add the line if it doesn't exist) smtpd_recipient_restrictions.

smtpd_recipient_restrictions = check_recipient_access hash:/etc/postfix/sender_checks

If there were other parameters on the line, place those after this value. Seems the trick is to have the check_recipient_access come first.

Lastly, make these changes take effect by reloading the Postfix configuration: /etc/init.d/postfix reload.

That does the trick. I made made a list of all the addresses that don't need to receive e-mail and added them to the blocked file. Attempting to send e-mail to one of those addresses results in an immediate rejection. It's not that I ever saw any of the SPAM being dumped into these addresses, but I don't even want to give the spammers the courtesy of accepting their snake oil.

April 27, 2014

SSH file system

For most of DrQue.net's life the server was sitting directly next to my console computer. This changed in the summer of 2011 when DrQue.net moved to Zen's work place. Since I had been often working remotely I was use to the procedure of not being on the same network. I have use FileZilla to transfer data to/from the server using an SSH connection. The other day I was considering something a little more direct. I use to share the server's drive using NTP and Samba. I thought I might be able to use NTP again for this. But I wondered if NTP was encrypted or could be tunneled through SSH. Before answering this question fully, I found the file system sshfs. It allows me to mount a path over SSH, and that was precisely what I was looking for.

For my non-Linux machines I found there is win-sshfs. It doesn't work well at all. After some period of time the connection to the drive is lost, and never comes back. It requires that you unmount the drive and remount it. This don't work for long and often results in a blue screen of death.

One of the things I like about having a server mounted as a drive is the scripting elements I can run. For example, the scripts that resize the images in my photo gallery can now do everything in one step. Also, having GUI browsing of the server directories makes it much easier to do clean up. It is a little slow since the drive is remote, but still very function.

   Log rotation has been setup on the Micro-Dragon.  There are several steps involved for log rotation.  The first thing is to run an update on all the statistics.  Then the log files for each site are compressed and placed in an archive directory, and lastly Apache is giving a restart command so the logs start empty.  With this setup I should be able to ignore the log directory without risking giant log files when I remember them 6 months latter.
   Although the log rotation works manually, I initially set things up to rotates the logs daily.  So at midnight the logs should rotate again.  If all goes well, I will have logs tomorrow morning.  If not, I will have to see what I broke.
   What is nice is that the designers of logrotate seem to have thought of everything one would want to do.  The tricky part was figuring out all of what I wanted, and a way to verify I got it all.
   After I installed the new router the other day, most of the all the house computers just worked.  However, a couple of non-Linux machines decided they would refuse to connect to the wireless network.  My guess is that they saw the same wireless access point, but a different MAC address.  However, the computers didn't specify why they refused to connect or display any kind of error—they just won't connect.  Removing the wireless access point and adding it fresh fixed the problem, but what a stupid issue to have to work around.
   I have several custom framed pictures, but I had not hung any of them at Elmwood Park yet.  The reason was that our living room has picture rail—a type of molding designed to suspend picture frames.  This eliminates the need for putting holes in the walls, and I've been meaning to utilize it since we moved in.  I needed two items for this: some decorative chain, and picture rail brackets.  The chain I ordered before xmass but the order was quietly canceled.  So I ordered it again more recently, and after getting brackets finally got around to hanging the pictures.
   Although I have had the replacement router for around 6 months now, I finally got around to installing it.  The old router was a Linksys running DD-WRT.  It was clocked at 200 MHz and was a special containing 16 MB of RAM—quite a large amount for a router at the time.  The replacement router has a 450 MHz processors and 64 MB of RAM.  This is a significant improvement and the hopes are it will help with high-traffic periods at Elmwood Park.

The Ratrap has been running with its updated list of "ban me" URLs and now has over 330 IP addresses banned. It is working too as my error log file is far less polluted than before. Now I see actual errors—links I made incorrectly, missing robots.txt files, and requests for CSS images I overlooked. Since the change over traffic has remained about the same. The initial drop in traffic recovered quickly and probably wasn't anything unusual—a normal lull and not part of the changes I made. April is looking to be on track for hitting the 9,000 to 10,000 visits for the month. I am pleased with that number because even of 1% find something useful each month that is still 100 people every month who found what they are looking for.

I added some running statistics from the Blue-Dragon on it's about page. This links the Blue-Dragon, and in particular the networks for Elmwood Park and the Micro-Dragon. While there isn't yet a grand plan for this, it does return the network to a state it has not been since it was setup in the Dragon's Den at Park Place.