NTP Server Monitoring Daemon Alarming System Design and Implementation

One of the things I really wanted for my Raspberry Pi 4 NTP Server project was a visual indicator if something is wrong. I hate having things not work and not realizing it until it’s a problem.

An example would be if the GPS interface were malfunctioning and the NTP peer changed away from the GPS PPS signal. Another example would be if CPU utilization became too high.

To do that, I created an alarming package for the Golang daemon. The first component is an alarm record. It defines a memory structure that services that want to raise an alarm can use.

type Alarm struct {
    Key         string
    Description string
    Raised      bool
}

The next element was to give system monitors a way of raising the alarm. The SendAlarm() function accepts the alarm record, checks a couple of conditions, and then transmits the alarm to the alarm channel. It makes sure the alarm channel isn’t full, and if it is, drops the alarm. This keeps the sending goroutine from possibly blocking because the alarm processor isn’t running or is busy.

func SendAlarm(alarm Alarm) {
    if alarmChannel == nil {
        log.Print("Alarm Channel not initialized. Dropping alarm.", alarm)
    } else if len(alarmChannel) == alarmChannelSize {
        log.Print("Alarm channel full. Possible error in process alarms.")
    } else {
        alarmChannel <- alarm
    }
}

Here’s a example of how the GPSD Monitoring system raises an alarm if the number of satellites is too low:

alarm := alarms.Alarm{Key: "UsedSatellites", Raised: (usedSats < minSatellites)}
if alarm.Raised {
    /*
        The number of used satellites for our fix/position is less
        than the desired number. For a standard GPS, non-fixed
        position/non-timing mode, 4 satellites are required for a
        3D fix.
    */
    alarm.Description = fmt.Sprintf("Used Sats = %d < %d", usedSats, minSatellites)
    alarms.SendAlarm(alarm)
    raisedAlarms[alarm.Key] = struct{}{}
} else {
    if _, present := raisedAlarms[alarm.Key]; present {
        // The alarm was previously raised. Clear it.
        alarms.ClearAlarm(alarm)
        delete(raisedAlarms, alarm.Key)
    }
}

I wanted to make different kinds of listeners that would respond to alarm events. For example, I wanted a listener that would print the alarms to the system log. Another example might be a listener that would send an SMS message when an alarm is raised. Here’s the log listener:

type AlarmListener func(Alarm)

func AddListener(lstnr AlarmListener) {
    listeners = append(listeners, lstnr)
}

func LogListener(alarm Alarm) {
    if alarm.Raised {
        log.Printf("Alarm Raised: Key: %s, Description: %s", 
                    alarm.Key, alarm.Description)
    } else {
        log.Printf("Alarm Cleared: Key: %s", alarm.Key)
    }
}

func main() {
    alarms.AddListener(alarms.LogListener)
}

To handle the LED flasher, I created an indicator. The indicator just receives notification of the state. Is an alarm raised or not?

type AlarmIndicator interface {
    SetState(state bool)
    Close()
}

var indicators []AlarmIndicator

func AddIndicator(indicator AlarmIndicator) {
    indicators = append(indicators, indicator)
}

func main() {
    // Now, we can add the LED indicator
    alarms.AddIndicator(ledIndicator)
}

The alarm processing code invokes the indicators:

for _, indicator := range indicators {
    // For indicators, we set the state
    //
    // True = Any Alarm is Raised
    // False = No Alarms are Raised.
    indicator.SetState(len(mRaised) > 0)
}

The LED indicator code is show below. If it’s set to True, the indicator starts a goroutine that turns the LED on and off every 500ms. If the state is false, a message is sent to the termination channel ch. The send on the ch channel will unblock the flasher goroutine and cause it to terminate.

func (led *LEDIndicator) SetState(state bool) {
    if state {
        if led.running {
            return
        }
        led.running = true
        go func() {
            on := 0
            for led.running {
                select {
                case <-time.After(BLINK_INTERVAL):
                    // toggle LED
                    if on%2 == 0 {
                        led.pin.Out(led.enabled)
                    } else {
                        led.pin.Out(led.disabled)
                    }
                    on = on + 1

                case <-led.ch:
                    led.running = false
                    led.pin.Out(led.disabled)
                }
            }
        }()
    } else {
        if led.running {
            led.ch <- 0
        }
        led.pin.Out(led.disabled)
    }
}

The final piece that makes it all work is the alarm processor routine. It’s run as a goroutine so that it’s always active. When an alarm is sent, the read from alarmChannel unblocks.

func ProcessAlarms() {
    fmt.Println("ProcessAlarms() entered!")
    alarmChannel = make(chan Alarm, alarmChannelSize)
    defer close(alarmChannel)
    mRaised = make(map[string]Alarm)
    for {
        alarm := <-alarmChannel
        _, present := mRaised[alarm.Key]
        propagate := false
        if present {
            if alarm.Raised {
                // This is a re-raise. Don't propagate it, but in case the
                // message has changed, update it.
                mRaised[alarm.Key] = alarm
            } else {
                propagate = true
                delete(mRaised, alarm.Key)
            }
        } else {
            if alarm.Raised {
                propagate = true
                mRaised[alarm.Key] = alarm
            }
        }
        if propagate {
            // For listeners, we only send when there's a new alarm,
            // or an existing one is cleared.
            for _, lstnr := range listeners {
                lstnr(alarm)
            }
            for _, indicator := range indicators {
                // For indicators, we set the state
                //
                // True = Any Alarm is Raised
                // False = No Alarms are Raised.
                indicator.SetState(len(mRaised) > 0)
            }
        }
    }
}

Conclusion

The Go Language made creating a generic alarm subsystem a really easy task. The entire alarm handling package is 106 lines of Golang. More importantly, it’s easy to understand, and easy for clients to use. It supports multiple listeners, and different indicators.

Looking at the whole package, one key feature that might be useful would be an implementation that would handle hysteresis or flapping. Most of the elements are present, and it would be pretty straight forward to do.