Do baselines and thresholds work to protect critical, unpredictable IP networks from DDoS attacks?
Baselines and thresholds have been the relied-upon method for DDoS attack protection for two decades. However, thresholds are very static and must be specified for a large set of protocol/port/traffic type combinations.
In the past, networks could be built deterministically to support a relatively small range of application and service types; network reliability and protection could be easily tied to known application traffic characteristics whether they were voice, data or games. Today, not only has the criticality of these networks increased, but the range of applications and services have massively increased;very often the traffic from these services and applications undergoes significant fluctuations due to new applications, updates, changing routes controlled by the application vendors themselves, major life events such as disasters, disease, political and economic conditions, and more.
In the case of solution architectures using distributed DDoS intelligence, baselines also need to be distributed network-wide and tailored to specific network interfaces. The definition of the tolerated deviation may vary widely across the different routers and network interfaces.
Never have we seen higher demands on IP networks, while the demands on these networks lead to ever-greater unpredictability in traffic patterns. It is in this environment that baselines and thresholds are no longer sufficient mechanisms for DDoS detection and mitigation decisions.
Therefore, thresholds and baselines are no longer the most effective method to thwart attacks in today’s highly critical IP networks where traffic unpredictability is more common and services/applications using the network are more dynamic.
Why are thresholds and traffic baselines no longer working?
Legacy DDoS detection threshold-based triggers are defined by configuring the amount of traffic (bandwidth) or packets (packet rate) at which the monitored object is considered to be attacked. Often, multiple thresholds are created to separate bandwidth levels (bps) and packet intensity levels (pps) for specific traffic types and protocols (e.g., TCP syn packets). The main problem is that legacy systems had no other way of distinguishing between good and bad traffic.
More importantly, these thresholds were not adjusted automatically, meaningregular traffic growth needs to be monitored continuously and thresholds adjusted; otherwise, even normal customer traffic growth would trigger false positives.
It’s easy to see the challenge: With static thresholds, there is no way to automatically maintain how much traffic of a given type (e.g., TCP syn) is “normal” and where a threshold should be set or when the situation improves for the customer. Setting the thresholds too low results in false positives, while setting them too high opens the gates for DDoS traffic to be uncaught and unmitigated (false negatives). Both situations are bad for automation.
This means that continuous tuning is an ongoing and challenging task for operational teams looking at DDoS protection. Their job is made harder because they may not have full and up-to-date network traffic information available.
To help them properly set thresholds and understand traffic patterns in the absence of DDoS traffic (peacetime regime), service operators have been using baselines, primarily measuring traffic volumes:
- transported by a network device (often on a specific interface or link)
- of a given protocol (e.g., UDP, TCP or ICMP)
- destined to a given monitored resource
Baselines are selectively calculated because they require at least a 7-day time range to allow for comparison of the same day of week values. Combined with additional attributes like interface, router, and monitored object (network prefix), this calculation leads to high storage requirements and processing power. For this reason, this calculation cannot be done on a per-host basis. In addition, per-host traffic may vary too much, from one host to another and over time for building useful baselines.
So, baselines are used to identify traffic anomalies and unusual patterns that represent a clear deviation from “normal.” But, what is “normal”?
A link or a port outage, or a change in peering traffic, may cause dynamic shifts in network traffic patterns and could generate a lot of false positives, unnecessarily activating mitigations and impacting valid traffic. Suppose the baseline calculation and evaluation processing is distributed (using decentralized telemetry collectors). In that case, it can impact DDoS detection and mitigation even more: an individual collector would generate false alerts even when traffic shifts to a monitored router interface on a different collector.
The situation becomes even more complex and complicated if the network is already under attack, especially with the DDoS “noise floor” rising and dynamically changing. Finding valid thresholds or creating clean baselines is very challenging in the ever-changing conditions of dynamic and unpredictable network loads that we have today.
Another challenge with baseline-driven alerts is that they provide very little (or zero) attack context besides the fact that the current traffic level deviates from the baseline.
We need a better way of establishing “peacetime” traffic models in highly dynamic network environments where DDoS has become a constant “clear and present danger.”
What is the best approach to declare “peacetime” and monitor DDoS traffic before attack levels jump into the terabit-per-second territory?
Dynamic peacetime data models
Baselines and thresholds were largely reactive methods to dealing with DDoS attacks. This week’s baseline triggers may need to be tuned next week in the hopes of catching the needle in a haystack (even with a constantly changing haystack). Today’s networks demand a proactive approach to DDoS protection that combines an understanding of the entire Internet’s topology (e.g. i main traffic sources, traffic distribution paths, transit and consumption) combined with a greater awareness of endpoints involved (e.g., applications, cloud, distribution, transit, endpoints) and a detailed understanding of the attack supply chain (e.g., attack tools, amplification/reflection points, IP spoofing hosts, botnets, etc…).
Dynamic peacetime data models are quite different from baselines. They use a much broader dataset to identify abnormal traffic behavior. Examples of telemetry data examined by peacetime modeling are:
- packet attributes such as length, TTL (time-to-live), TCP-flags
- packet and protocol ratios like SYN/ACKs, NTP/total traffic
- source IP details (e.g., distribution, invariant, geolocation)
- BGP attributes (e.g., origin AS, AS path, ingress peer)
- source service types (e.g., CDN, gaming, DDoS reflector)
- destination service types (e.g., consumer, subscriber, server)
- conversation details such as source IP and port combined with destination IP and port
Using big data analytics, with much larger data sets that span many more dimensions, and applying it to DDoS security to enhance our ability to dynamically understand peacetime data allows us to clearly identify outliers and also map out the DDoS attack supply and distribution chain. This provides a major paradigm shift resulting in much better protection for next-generation DDoS.
Some of these data dimensions and attributes are not directly derived from flow records. Rather, a wider security context may be used to enhance these peacetime data models.
At Deepfield, we use our traffic and security intelligence feeds, Deepfield Cloud Genome™ and Deepfield Secure Genome™ (collectively refered to as Deepfield Genome) - a Nokia-proprietary, internet-based and cloud-hosted software data feed that tracks, maps, and analyzes billions of internet endpoints and flows to provide a dynamic supply and security map of the internet. Deepfield Secure Genome™ complements the Deepfield Cloud Genome™ by maintaining a live data feed with up-to-date information on potential DDoS threats and secure and insecure (allow/block) internet sources, destinations and traffic patterns.
Using a dynamic and internet-based security context, we create peacetime models that do not require additional processing or storage because they are based on traffic data stored in the big data warehouse. This means that a peacetime data model is available for every protected resource and can be obtained via a real-time query.
For the first time, we have the capability to proactively stop DDoS attacks by using dynamic, global knowledge of the entire supply and distribution chain of DDoS attacks. Not only can we be more proactive, but we can improve both the effectiveness and efficiency (lowering the cost) of protection by an order of magnitude using this approach
How is the dynamic peacetime model applied?
Generic peacetime model attributes are periodically evaluated to identify and quantify normal network traffic behavior. Traffic attributes that fall outside normal behavior boundaries are treated as misbehavior (or at least as suspicious). This approach results in a comprehensive set of flow signatures that are applied to DDoS classification rules and detection policies.
Matching every network flow against this signature set allows for real-time DDoS flow classification of suspicious traffic on the network.
Using a real-time, signature-based approach - without involving any additional thresholds or baselines - delivers a very important new functionality: we can look for suspicious traffic by simply querying the traffic database. Now, reported correlation that indicates suspicious traffic can help us to provide answers to questions like:
- How much suspicious traffic enters my network on any given peering link?
- Who (which destination or host) is receiving the most DDoS traffic?
- Which customers are potentially botnet infected?
- Have servers in my datacentre been hijacked and used for DDoS?
With this approach, the suspicious matching of traffic flows provides detailed information about DDoS flows regardless of how distributed they are, where they enter the network, and what they target. It also acts as a “magnifying glass” to inspect the “white noise“of typically unseen or undetected DDoS traffic.
Another benefit of flagging some traffic as suspicious (before declaring it part of a DDoS attack) is separating DDoS from normal and “slightly unusual” traffic in the network. We do not need to take action on a single suspicious flow or a very low volume of suspicious flows, but we can continue to monitor this type of traffic – to see whether we are dealing with a DDoS attack.
Automatically identifying a DDoS attack becomes a DDoS policy operation as we continue to check DDoS events/alerts and trigger reactions if the nature and volume of classified DDoS traffic changes.
Using a peacetime data model beyond detection
Peacetime modelling is a novel approach to improving DDoS detection accuracy. In addition, it can significantly improve the network-based mitigation capabilities (using network edge routers to block DDoS) and the overall mitigation efficiency.
Many attacks, such as known amplification attacks, are simple and easy to blockif they match a known pattern. However, an increasing amount of applications can be misused for DDoS amplification, along with an increasing number of amplifier hosts on the internet. Blocking these attacks today requires a comprehensive understanding of normal and abnormal application behavior globally and also in regards to the protected network resource.
Reflection attacks are harder to block, especially without blocking too much valid traffic at the same time. For example, in TCP-reflection attacks, each server may be providing many TCP-based services (like web servers). In this case, a peacetime model-based analysis helps to understand which external servers or networks are regularly used, as opposed to those used only under DDoS attacks.
The dynamic peacetime model provides this information, but it can’t be leveraged directly to create filter entries (allow-lists, block-list or mitigations) and send them to a router. The peacetime model needs to be translated into IP-filter entries compatible with routers’ match capabilities, ordering and syntax.
Nokia Deepfield DDoS solution based on Deepfield Defender uses this peacetime modelling approach to create router-deployable allow-lists and block-lists. Additionally, the telemetry from advanced routers such as Nokia 7750 Service Router (SR) portfolio based on FP4 and FP5 network processors, with built-in IP network security features, can be used as additional, dynamic mitigation countermeasure inputs – including match criteria such as packet-length, specific payload signatures, and others.
Benefits of a peacetime data model for DDoS protection
There are many benefits of this approach over the legacy approaches (thresholds, baselines):
- Big data-based security analytics allows us to track the entirety of network traffic with real-time network information and a centralized, dynamic policy repository
- We obtain a holistic look at DDoS across the entire network
- We can capture attacks by identifying components of an attack, even when they do not create events initially
- We can capture multi-vector attacks
- Automatic policy adjustments can be enhanced with the use of ML and AI
- Mitigation can happen where it matters the most – at the network edge
- We can define smart countermeasures – sets and suites of filter entries to block many types of attacks
- DDoS automation can be achieved