6.2 KiB
Leakybuckets
Bucket concepts
The Leakybucket is used for decision making. Under certain conditions, enriched events are poured into these buckets. When these buckets are full, we raise a new event. After this event is raised the bucket is destroyed. There are many types of buckets, and we welcome any new useful design of buckets.
Usually, the bucket configuration generates the creation of many buckets. They are differentiated by a field called stackkey. When two events arrive with the same stackkey they go in the same matching bucket.
The very purpose of these buckets is to detect clients that exceed a certain rate of attempts to do something (ssh connection, http authentication failure, etc...). Thus, the most used stackkey field is often the source_ip.
Standard leaky buckets
Default buckets have two main configuration options:
-
capacity: number of events the bucket can hold. When the capacity is reached and a new event is poured, a new event is raised. We call this type of event overflow. This is an int.
-
leakspeed: duration needed for an event to leak. When an event leaks, it disappears from the bucket.
Trigger
A Trigger is a special type of bucket with a capacity of zero. Thus, when an event is poured into a trigger, it always raises an overflow.
Uniq
A Uniq is a bucket working like the standard leaky bucket except for one thing: a filter returns a property for each event and only one occurrence of this property is allowed in the bucket, thus the bucket is called uniq.
Counter
A Counter is a special type of bucket with an infinite capacity and an infinite leakspeed (it never overflows, nor leaks). Nevertheless, the event is raised after a fixed duration. The option is called duration.
Bayesian
A Bayesian is a special bucket that runs bayesian inference instead of
counting events. Each event must have its likelihoods specified in the
yaml file under prob_given_benign
and prob_given_evil
. The bucket
will continue evaluating events until the posterior goes above the
threshold (triggering the overflow) or the duration (specified by leakspeed)
expires.
Available configuration options for buckets
Fields for standard buckets
-
type: mandatory field. Must be one of "leaky", "trigger", "uniq" or "counter"
-
name: mandatory field, but the value is totally open. Nevertheless, this value will tag the events raised by the bucket.
-
filter: mandatory field. It's a filter that is run to decide whether an event matches the bucket or not. The filter has to return a boolean. As a filter implementation we use https://github.com/antonmedv/expr
-
capacity: [mandatory for now, shouldn't be mandatory in the final version] it's the size of the bucket. When pouring in a bucket already with size events, it overflows.
-
leakspeed: leakspeed is a time duration (it has to be parsed by https://golang.org/pkg/time/#ParseDuration). After each interval, an event is leaked from the bucket.
-
stackkey: mandatory field. This field is used to differentiate on which instance of the bucket the matching events will be poured. When an unknown stackkey is seen in an event, a new bucket is created.
-
on_overflow: optional field, that tells what to do when the bucket is returning the overflow event. As of today, the possibilities are "ban,1h", "Reprocess" or "Delete". Reprocess is used to send the raised event back to the event pool to be matched against buckets
Fields for special buckets
Uniq
- uniq_filter: an expression that must comply with the syntax defined in https://github.com/antonmedv/expr and must return a string. All strings returned by this filter in the same buckets have to be different. Thus if a string is seen twice, the event is dismissed.
Trigger
Capacity and leakspeed are not relevant for this kind of bucket.
Counter
- duration: the Counter will be destroyed after this interval has elapsed since its creation. The duration must be parsed by https://golang.org/pkg/time/#ParseDuration. Nevertheless, this kind of bucket is often used with an infinite leakspeed and an infinite capacity [capacity set to -1 for now].
Bayesian
- bayesian_prior: The prior to start with
- bayesian_threshold: The threshold for the posterior to trigger the overflow.
- bayesian_conditions: List of Bayesian conditions with likelihoods
Bayesian Conditions are built from:
- condition: The expr for this specific condition to be true
- prob_given_evil: The likelihood an IP satisfies the condition given the fact that it is a maliscious IP
- prob_given_benign: The likelihood an IP satisfies the condition given the fact that it is a benign IP
- guillotine: Bool to stop the condition from getting evaluated if it has evaluated to true once. This should be used if evaluating the condition is computationally expensive.
Add examples here
# ssh bruteforce
- type: leaky
name: ssh_bruteforce
filter: "Meta.log_type == 'ssh_failed-auth'"
leakspeed: "10s"
capacity: 5
stackkey: "source_ip"
on_overflow: ban,1h
# reporting of src_ip,dest_port seen
- type: counter
name: counter
filter: "Meta.service == 'tcp' && Event.new_connection == 'true'"
distinct: "Meta.source_ip + ':' + Meta.dest_port"
duration: 5m
capacity: -1
- type: trigger
name: "New connection"
filter: "Meta.service == 'tcp' && Event.new_connection == 'true'"
on_overflow: Reprocess
Note on leakybuckets implementation
[This is not dry enough to have many details here, but:]
The bucket code is triggered by InfiniBucketify
in main.go
.
There is one struct called buckets which is for now a
map[string]interface{}
that holds all buckets. The key of this map
is derived from the filter configured for the bucket and its
stackkey. This looks complicated, but it allows us to use
only one struct. This is done in buckets.go.
On top of that the implementation defines only the standard leaky
bucket. A goroutine is launched for every bucket (bucket.go
). This
goroutine manages the life of the bucket.
For special buckets, hooks are defined at initialization time in manager.go. Hooks are called when relevant by the bucket goroutine when events are poured and/or when a bucket overflows.