Error Correcting Codes, or ECC is a technique in RAM used to ensure data integrity. ECC on a PC is single-error-correcting and double-error-detecting (SECDED). So how does it work? I asked this question many years ago, and a co-worker of mine said it probably had a parity check for both rows and columns. Where the parity didn't match up showed the row and column of a single bit error. Curious, I tried to implement this.
Parity (for you not so hardware types) is basically an 1-bit, base 2 adder. Base 2 is binary. 1 bit means we ignore overflow. Lets try it: 0 + 0 = 0. 0 + 1 = 1. Still with me. 1 + 1 = 0. Lost now? In binary, 1 + 1 = 10. But since we only have a 1-bit adder, we only save the first bit, 0. Parity is useful because if you have a small amount of data and tack on a parity bit, you can tell if the data was somehow messed up. A 1-bit, base 2 adder is actually equivalent to what is known as an exclusive or, or XOR. It's easy to implement in both hardware and software, and it's fast to calculate. The draw back is that it isn't very thorough. If you have 8 bits of data and 1 bit of parity, you are guaranteed to catch a 1-bit error. But you will not catch a 2-bit error.
You can have 2-bit parities, or even larger. Keep in mind, however, each additional bit of parity adds additional data you have to keep track of.
Now applying parity to ECC. Let's start with 64-bits of data. I picked 64 because it is 8x8, and 8-bits of data is a single byte. We can lay out our 64-bits in rows and columns--each row is 8-bits, and there are 8 rows. Now for each row and column, we have a parity bit. You can probably calculate the parity in your head just looking across the row/column, but if you are not use to it simple add up all the bits in the row or column and do a modulus remainder by 2--you will need a calculator with a "MOD" button.
Here is a little java script to demonstrate.
The blue area is the 8x8 bit matrix--64 total bits. The yellow areas to the sides are the parity bits. There are 2 sets of parity bits for this demonstration. The top/left set is the calculated based off the actual data in the matrix. The bottom/right set is the "latched" or expected parity. When they don't match, the row or column is highlighted.
Click in any of the blue area to toggle a bit. Notice that the row or column will get highlighted. This is because the parity for this row/column no longer matches. When there is just a single error, the row and columns converge on the bit that has been changed. We now have a single-bit error, and because we know where the bad bit is located, it can be corrected. This works for ANY single bit error, including a single bit error in the parity bits. You can change the latched parity by clicking on it too. Notice how only a single row or single column is highlighted. This shows up the error bit is in the parity and the data is just fine.
Now try changing two bits of data. There are two types of results. Either you will have two rows or two columns highlighted, or 2 rows and 2 columns highlighted. For any of the cases, you won't know which bits had the error. If only rows or columns are highlighted, any of those bits in any of the selection could be wrong. If 2 rows and 2 columns are highlighted, there are 4 possible locations for an error--again, no way to know which. So we can detect any two errors in the data.
So how good is my parity system? Well, for 64-bits of data, an additional 16-bits of parity are needed--so an overhead of 25%. Also, there is one 2-bit error this system always gets wrong. If you change two parity bits, one in the rows parity and one in the columns parity, it will look like a single bit correctable error--which is wrong--this would be a double bit uncorrectable error. Computer memory doesn't use this system for ECC, but a method called Hamming Codes. Maybe I'll write about them one day, but for now, you can see how it is possible to have 1-bit error correction, 2 bit error detection.