Now that I have the offsets of my missing article data, I could begin the process of data extraction. The first thing I did was to pull all the missing data from the 137 GB file into a much smaller data file. The data was grouped to a region under 200 MB in size.
Using the search program, I was able to make the same list of offsets and dates with the smaller data file. While the search was running, I was trying to figure out how to parse MySQL MyISAM files. While I made some progress, there were some records I failed to parse correctly. While that method would have been my best bet, it wasn't going to work easily. So I decided to simply export article records as some kind of hack CSV file. Writing a utility to do this was pretty easy. I could use the list of offsets to start each row of the CSV file, and then dump all the data between the current offset and the next offset into the row. This was more than I needed, but gave me a CSV file to work with.
After that it was time to clean this file. I had to remove garbage at the end of each line that amounted to the control characters used by MySQL to delineate records, but after that turning the file into a real CSV file was just a little grunt work.
I ended up with a CSV file with 84 rows—articles from June 29 to September 21. Now I needed to insert these into the database. But before that, I wanted to display the articles and read through them. Storage on hard drives isn't always linear, and sometimes data is moved around. A quick PHP script gave me the ability to view the articles on screen, read the content and view the article's pictures. A few articles picture links were messed up, but I managed to figure out what picture was actually needed. I also managed to sort out all but one article's mismatched bodies. Only the article for August 30 had missing data—I simply could not locate the rest of the article body.
Once I had the articles cleaned up and ready to go, it was time to put them back into the database. Some quick SQL syntax and my sandbox server showed I had functionality. A quick copy to the main server and articles which have been offline for close to 4 years are again visible.
It was a lot of work, but I am happy with the results. I recovered 84 articles including one math article. Many of the articles are about my early flight lessons, and I was happy to read through them again.
Pictured is my data hard drive plugged into my main computer with a USB Pluvius gave to us. It is a great device, and I get a lot of use from it.