Andrew Que Sites list Photos
Projects Contact
Main

January 06, 2021

E-mails for failed backups

   I have backup scripts that run periodically.  The Backup Dragon moved off-site and is now running.  Every hour it makes a request to DrQue.net that registers the IP address.  SSH on a non-standard port allows the Data Dragon to connect to it and do backups.  With the backups running, one last item I wanted to make sure was working was e-mails in case of a failure.  I found the program msmtp which simply allows me to send e-mail from the command line.
#!/bin/bash
ip=`ssh used@webServer "cat /directory/ipAddress.txt"`
now=`date`
echo "Sync date: $now" > /path/backupReport.txt
echo "Remote IP: $ip" >> /path/backupReport.txt

start=`date +%s.%N`
rsync -arv -e 'ssh -p <port>' --exclude-from /path/exclude-list /path/to/backup/ used@$ip:/path/to/destination/ >> /path/backupReport.txt 2> /path/backupReportErrors.txt
if [ $? != 0 ]; then
  # If there was an error, send an e-mail detailing what went wrong.
  errorMessage=`cat /path/backupReportErrors.txt`
  echo "ERROR: Backups failed"
  echo -e "Subject: ERROR: Backups failed.\n\n$errorMessage" | msmtp username@domain
fi

end=`date +%s.%N`
delta=`awk '{print $1-$2}' <<< "$end $start"`
now=`date`
echo "Finished: $now ($delta seconds)" >> /path/backupReport.txt
   This script first logs into the webserver to fetch the saved IP address of the remote machine.  Then it creates a backup report file, notes the current time and the IP address of the remote machine.  Then rsync is used over SSH to synchronize the two servers.  Output is logged to the backup report, and errors are logged to a separate log file.  If rsync runs into any problems, an e-mail is composed with the contents of the error report and sent using msmtp.  Lastly the time it took to run the backup is calculated and written to the backup report.
   The script should run on any Linux system and only requires rsync and msmtp.  I use a similar script for the daily in-house backups from the Snow Dragon to the Red Dragon.

December 17, 2020

Variance Stability Check and Overflow

In the last two articles I explored the use of variance as a method to check for signal stability. The implementation shared in the last article has one drawback: integer overflow. This is not so much a problem but a design consideration.

The overflow issue is present in both the array-based and pure sum-based implementations. Let’s begin with the sum variables: sum and sumSum. The size of these variables depends on the input data and the window size. In the example provides, the input data was 8-bit. In the example using an array, the window size is 256, so 8-bits. For accumulating sum we have 8-bits for data, and 8 more bits for each sample for a total of 16-bits. For accumulating sumSum we have 8- bits squared (16-bits) for data, and 8-bits for the window for a total of 24-bits. Thus a 32-bit integer is needed. Now if 12-bit data were to be used, 12+8=20 bits for sum and 12+12+8=32 bits for sumSum. If, in this last example, the window size were increased from 256 to 300, 32-bits is no longer large enough for sumSum and could overflow.

That takes care of the the running sums. For the calculation of variance there are two or three variables: variance, denominator and maybe sumSquared. The size of variance needs to be twice the data width, plus twice the window bits. This is because variance multiplies sumSum by the window size. For our base example using 8 data bits and 8 window bits, this is 8+8+8+8=32 bits. However if we use 12-bit data this becomes 12+12+8+8=40 bits. So while 32-bits is enough for sumSum the variance calculation would require 64-bits. The size of sumSquared follows variance as they are typically similar in size—especially near zero variance. It is always twice the number of bits as sum since it is the square of sum.

The denominator for variance, denominator, is smaller and it simply twice the number of window bits.

Since this algorithm is all about speed, the variable size is a consideration when implemented on an 8-bit platform. 32-bit operations are expensive, especially multiplications and divisions and 64-bit operations are worse still. In our example with using 12-bit data, the variance calculation would require 64-bits. There are a couple of ways around this. The window data could be truncated to 8-bits. Or the variance calculation could drop some precision before the multiplications.

#include <stdbool.h>
#include <stdint.h>

enum { SHIFT = 7 };
enum { SHIFT_SIZE  = 1 << SHIFT };

enum { DROP_BITS = 4 };

static uint16_t count = 0;

static uint32_t sum;
static uint32_t sumSum;

//-----------------------------------------------------------------------------
// Uses:
//   Add a sample to stability check.
// Input:
//   sample - New sample.
//-----------------------------------------------------------------------------
void addSample( uint16_t sample )
{
  // Is the sample buffer not full?
  if ( count < SHIFT_SIZE )
    count += 1;
  else
  {
    // Remove old sample from sums.
    sum    -= sum >> SHIFT;
    sumSum -= sumSum >> SHIFT;
  }

  // Accumulate new sample.
  sum    += sample;
  sumSum += (uint32_t)sample * sample;
}

//-----------------------------------------------------------------------------
// Uses:
//   Check to see if the signal in the buffer is stable.
// Input:
//   varianceThreshold - Maximum variance.
// Output:
//   True if the signal is stable.
//-----------------------------------------------------------------------------
bool getStability( uint16_t varianceThreshold )
{
  // Assume signal is stable if there are not enough samples.
  bool isStable = true;

  uint16_t denominator = ( (uint16_t)( count - 1 ) * count ) >> DROP_BITS;

  // No divide-by-zero?
  if ( denominator > 0 )
  {
    uint32_t variance = ( sumSum >> DROP_BITS ) * ( count >> DROP_BITS );
    uint32_t sumSquared = ( sum >> DROP_BITS ) * ( sum >> DROP_BITS );
    if ( sumSquared < variance )
      variance -= sumSquared;
    else
      variance = 0;

    variance /= denominator;

    //varianceThreshold <<= DROP_BITS / 2;
    isStable = ( variance <= varianceThreshold );
  }

  return isStable;
}
 

Naturally some precision is lost by dropping the least significant bits. This will be reflected in the granularity of varianceThreshold as the results are always count in the drop bit increments.

December 16, 2020

Improving the use of Standard Deviation to Determine Signal Stability

I wrote about how standard deviation can be used to determine if a signal is stable (i.e. not significantly changing). There are two problems when putting this into practice in an embedded system. Standard deviation is typically defined using a two- pass compute process, and standard deviation requires a square root. If speed is required, as it often is with an embedded application, these two items can be an issue. Consider doing such a check in an interrupt with the 8- bit ATMega microcontroller of an Arduino Nano. There is not a lot of processing power available, especially if there is other work to be done in the foreground. Luckily, there are fairly simple workarounds that will improve the speed.

The first workaround is to use variance instead of standard deviation. The overly simplified definition of variance is that it is standard deviation without the finial square root.

Here, σ is standard deviation, and σ² is variance. Since we are just comparing against a target standard deviation, we can simply modify the target so we can use variance instead.

So squaring our target allows us to use variance instead of standard deviation. That takes care of removing the square root.

Now the two-pass compute. At first this might not seem much of an issue, but consider an interrupt driven ADC where the Interrupt Service Routine (ISR) needs to be kept short. The time it takes to loop through the sampled values twice might be longer than there is budget time. I’ve written in the past about how to do a moving average and standard deviation. This method using the naïve algorithm with running sums. Let’s review the algorithm:

We can break this down:

In this form, a and b are running sums that can be modified each time a new sample is available. Variance can then be computer from these sums so the stability check can be performed. Since the divide is constant, it could be replaced with a per-computed fixed-point multiplier.

#include <stdbool.h>
#include <stdint.h>
#include <limits.h>

enum { BUFFER_SIZE = 255 };

static uint8_t buffer[ BUFFER_SIZE ];
static uint8_t bufferIndex = 0;
static uint8_t bufferCount = 0;

static uint16_t bufferSum;
static uint32_t bufferSumSum;

//-----------------------------------------------------------------------------
// Uses:
//   Add a sample to stability buffer.
// Input:
//   sample - New sample.
//-----------------------------------------------------------------------------
void addSample( uint8_t sample )
{
  // Is the sample buffer not full?
  if ( bufferCount < BUFFER_SIZE )
    bufferCount += 1;
  else
  {
    // Remove old sample from sums.
    uint8_t oldSample = buffer[ bufferIndex ];
    bufferSum    -= oldSample;
    bufferSumSum -= (uint16_t)oldSample * oldSample;
  }

  // Save new sample.
  buffer[ bufferIndex ] = sample;

  // Accumulate new sample.
  bufferSum    += sample;
  bufferSumSum += (uint16_t)sample * sample;

  // Advance buffer index.
  bufferIndex += 1;
  if ( bufferIndex >= BUFFER_SIZE )
    bufferIndex = 0;
}

//-----------------------------------------------------------------------------
// Uses:
//   Check to see if the signal in the buffer is stable.
// Input:
//   varianceThreshold - Maximum variance.
// Output:
//   True if the signal is stable.
//-----------------------------------------------------------------------------
bool getStability( uint16_t varianceThreshold )
{
  // Assume signal is stable if there are not enough samples.
  bool isStable = true;

  // Enough sample to compute variance?  (Need at least 2 samples)
  if ( bufferCount > 1 )
  {
    uint32_t variance    = bufferSumSum * bufferCount - (uint32_t)bufferSum * bufferSum;
    uint16_t denominator = (uint16_t)( bufferCount - 1 ) * bufferCount;

    variance /= denominator;
    isStable = ( variance <= varianceThreshold );
  }

  return isStable;
}

This code implementation has two functions: addSample and getStability. It is currently setup to work on an 8-bit ADC like that of the Arduino Nano. Integer sizes would need to be adjusted to work with larger a ADC.

Note how as long as we keep a history of the samples, each new sample can keep the sums up-to-date without the need to loop through all the history and recalculate. This means the function adding the new sample can also check stability and do so very quickly.

If memory is an issue, the history can be replaced by subtracting off a single average sample. This turns the roll-off from something that is exact to a low-pass filter. However, this might be just fine depending on the application.

#include <stdbool.h>
#include <stdint.h>

enum { SHIFT = 7 };
enum { SHIFT_SIZE  = 1 << SHIFT };

static uint16_t count = 0;

static uint16_t sum;
static uint32_t sumSum;

//-----------------------------------------------------------------------------
// Uses:
//   Add a sample to stability check.
// Input:
//   sample - New sample.
//-----------------------------------------------------------------------------
void addSample( uint8_t sample )
{
  // Is the sample buffer not full?
  if ( count < SHIFT_SIZE )
    count += 1;
  else
  {
    // Remove old sample from sums.
    sum    -= sum >> SHIFT;
    sumSum -= sumSum >> SHIFT;
  }

  // Accumulate new sample.
  sum    += sample;
  sumSum += (uint16_t)sample * sample;
}

//-----------------------------------------------------------------------------
// Uses:
//   Check to see if the signal in the buffer is stable.
// Input:
//   varianceThreshold - Maximum variance.
// Output:
//   True if the signal is stable.
//-----------------------------------------------------------------------------
bool getStability( uint16_t varianceThreshold )
{
  // Assume signal is stable if there are not enough samples.
  bool isStable = true;

  // Enough sample to compute variance?  (Need at least 2 samples)
  if ( count > 1 )
  {
    uint32_t variance = sumSum * count;
    uint32_t sumSquared = (uint32_t)sum * sum;
    if ( sumSquared < variance )
      variance -= sumSquared;
    else
      variance = 0;

    uint16_t denominator = (uint16_t)( count - 1 ) * count;

    variance /= denominator;
    isStable = ( variance <= varianceThreshold );
  }

  return isStable;
}
 

The code is slightly smaller. Notice, however, the stability equation now has a check. In the pure form, the sum of squares times the number of samples is always larger than the sum squared. The approximate roll-off means these values don’t share this relationship. Thus, a check has to be done to avoid underflow.

This form of stability checking is very fast and uses very little memory. It can be used fairly easily on a small microcontroller. Care must be taken when using larger inputs. The sums can get large quickly and overflow. That will be the subject of the next article in this series.

December 15, 2020

Using Standard Deviation to Determine Signal Stability

In embedded programming one sometimes need to check when an analog signal is stable. For example, consider an adjustable pressure regulator where the resulting pressure is read by an Analog- to-Digital Converter (ADC). A user could change the regulator and over time the pressure will reach a new equilibrium.

In the graph above we have a simulation of the pressure set point on the dial changing, and the actual measured pressure changing as a result. Clearly while the dial is moving the pressure reading will be changing. There will also be a period after the dial stops moving the pressure will continue to change until a new equilibrium is reached. How can we say the new equilibrium has been obtained?

Let’s start with a simplified setup by assuming you know the intended value to be obtained. The most basic check would be simply to wait until the ADC reads the precise value desired. Hitting an exact value with an ADC is difficult because there is always noise in the ADC reading. To deal with this a range of values can be used—a center point plus/minus a tolerance band.

This graph shows a band for which to consider the new pressure value having been obtained. Checking for a band works to see a value has been reached, but doesn’t say that value is staying in that band. One option is to require the ADC value stay within this band for some period of time. Another option is similar. When the average of the last n samples is within a band. The average essentially builds in time. In either case, time is now a parameter. A signal is stable when it is within some tolerance of nominal for some period of time.

Now what happens if you do not know target value? In our pressure example we might just want to know the target pressure is stable or not, but have no knowledge what the target pressure should be. There are various makeshift options but there is a statistical approach that makes this fairly easy: standard deviation.

In a nutshell, standard deviation is a measure of how much change a set of data has from the mean average. If this set of data is samples taken at regular time intervals, then standard deviation is a direct measurement of the stability over time of that sample set. If a signal is changing over time, the standard deviation will be higher than if it is not, and the faster the change, the higher the standard deviation. So to use standard deviation, one simply needs to pick a standard deviation under which the signal is considered stable. Why this is particularly useful here is that while standard deviation is a measure of deviation from the average, the value of the actual average is irrelevant. Thus, this method works even when the target is unknown.

In the graph above we have added a plot of standard deviation with the scale exaggerated to make the effects easier to see. There are two transition lines, one at the left side denoting the pressure has become unstable, and on the right when the pressure has again stabilized. The green stabilization line represents the standard deviation under which the signal is considered stable.

The standard deviation works just like the windowed average used above. Some number of samples are kept in a rotating buffer from which the average and standard deviation can be computed. When a new sample arrives, the oldest sample in the set is discarded and the new sample added in it’s place. Stability is now just defined as whenever the standard deviation of some number of previous samples is below a threshold.

What’s nice about standard deviation is we can quantify what it means to be stable. For our pressure example, let’s say the pressure can range from 0 to 100 PSI. It might be said that the pressure is stable if the pressure over 1 second deviates less than 1 PSI. By definition, a standard deviation on this 1 second data set of 1 means that between 0% and 68% of the samples were within ±1 PSI of the average (depending on how close to Gaussian the distribution is). One could tighten the tolerance to two standard deviations, which would mean that between 75%- 95% of samples with ±1 PSI. This just requires dividing the target standard deviation in half. In fact, any tolerance band can be defined the number of standard deviations. Two standard deviation is pretty good though since it says that 95% of all data is within the tolerance band.

Above is the equation for stability checking using standard deviation. Standard deviation is typically represented with the Greek lowercase letter sigma, σ. We check this against our desired deviation value represented with Δ. Sdevs is the number of standard deviations to use. N is the number of samples in the windowed average, x the array of data points and x the average of x.

One consideration for using standard deviation for stability measurement is to make sure the window size is large enough. If the window is too small and the signal is noisy the standard deviation can jump around too much to be useful.

In the graph above, the window is too small and the noise causes the standard deviation to jump above a fairly conservative stability threshold. The solution is to increase the window size to include more samples. This slows down the response of stability detection and requires more memory. Another option is to filter the noise.

Here is simple low-pass filter using a windowed average is use (showing in cyan), and the stability check run on that.

This system can works for detecting instability as well. However, a gradual signal change will have low standard deviation and thus would not be detected. Of course, that is the difference between stability and change. Stability can allow for gradual change and as long as that change isn’t too great the system can be considered stable. So as with the selection of any algorithm, it is important to understand the needs of your project and to make sure you understand the algorithms you want to use to handle these needs.