Network Security: Using Wavelets to Extract Meaningful Information
Lately I’ve been working on Information security and Intrusion Detection Systems (IDS) and reliable methods for the extraction of meaningful information from the torrent of reporting data. The two most interesting techniques I’ve found include wavelets and artificial neural networks. Here we’ll discuss wavelets.

I’ll discuss ANNs at a later date.
There are a good number of rather sophisticated tools available to the educated user for all kinds of activity detection, collection, collation, and analysis. Unfortunately, most of these tools are number crunchers or simple regular expression pattern matchers and are very poor at presenting their results in visually meaningful ways.
For example, standard RRD flow charts do a nice job of plotting time sequenced data but require a well trained eye to recognize anomalous events or non-obvious structure.

One interesting approach for extracting hidden meaning from time-sequenced data employs signal analysis of network traffic statistics with wavelet filters, which are effective at exposing and separating ambient and anomalous traffic patterns.
Wavelet analysis organizes data into strata, or a heirarchy of component signals, analogous to a fourier decomposition but more sophisticated.
The following picture shows four panes. At the the top is the original data, followed by high, middle and low band wavelets. Grey boxes indicate soem anomalous bahavior, with is visible in the original data and could be recognized by a trained eye. However, anomalies become obvious to the casual observer when looking that the high and middle band wavelets.

One nice feature about wavelets is their relative insensitivity to data source. They can often achieve the same result (with varying levels of errors) from direct filtered packet capture or from SNMP statistics. Additionally, because wavelets are sensitive throught the “frequency” spectrum, they can detect both obvious, frontal network attacks and long term, subtle but un-usual changes in behavior.
A principal advantage to wavelets is that they are portable – they can be install installed into any network system without training, and are automated – they do not require operator input to function.
Wavelets provide the for signal decomposition and context, but must be combined with a viable triggering or threshold algorithm to create meaning. There are several popular thresholding methods, and one I especially like uses a deviation score. The deviation score is determined by calculating the variability in “high”, “medium” and “low” frequency signals within a moving window of sample data. Each frequency has its own moving window size, which can be adjusted to determine the sensitivity of the algorithm.
References



