History

mmetc 5cd4406f5e typos/grammar (#2561 )		2023-11-07 15:07:36 +01:00
..
tests	Add bayesian bucket type (#2290 )	2023-06-21 15:08:27 +02:00
bayesian.go	Add bayesian bucket type (#2290 )	2023-06-21 15:08:27 +02:00
blackhole.go	Lint (type inference): remove redundant type declarations (#2111 )	2023-03-09 11:56:02 +01:00
bucket.go	typos/grammar (#2561 )	2023-11-07 15:07:36 +01:00
buckets.go	[Rebased] fix races (#633 )	2021-02-25 11:26:46 +01:00
buckets_test.go	fix the ci by adding the ability to enforce event ordering (#2347 )	2023-07-20 11:41:30 +02:00
conditional.go	fix issue #2172 (#2177 )	2023-04-28 16:32:46 +02:00
manager_load.go	ligten bucket logger (#2523 )	2023-10-06 14:42:44 +02:00
manager_load_test.go	Add bayesian bucket type (#2290 )	2023-06-21 15:08:27 +02:00
manager_run.go	Dont create 3 maps just pass the same one to expr (#2421 )	2023-10-13 22:35:30 +01:00
manager_run_test.go	Lint (type inference): remove redundant type declarations (#2111 )	2023-03-09 11:56:02 +01:00
overflow_filter.go	use expr.Function for custom functions instead of passing them in the env (#2133 )	2023-03-28 10:49:01 +02:00
overflows.go	scenario labels to map string interface (#2201 )	2023-07-24 15:19:28 +02:00
processor.go	Add conditional bucket (#1962 )	2023-01-06 09:26:16 +01:00
queue.go	improve tests in pkg/leakybuckets (#171 )	2020-08-05 11:39:54 +02:00
README.md	update leakybucket readme (#2298 )	2023-06-22 15:35:01 +02:00
reset_filter.go	use expr.Function for custom functions instead of passing them in the env (#2133 )	2023-03-28 10:49:01 +02:00
timemachine.go	conditional overflow doesn't overflow on capacity (#2412 )	2023-08-08 16:12:50 +01:00
trigger.go	properly update the time structure within event (#2122 )	2023-03-16 16:25:50 +01:00
uniq.go	use expr.Function for custom functions instead of passing them in the env (#2133 )	2023-03-28 10:49:01 +02:00

README.md

Leakybuckets

Bucket concepts

The Leakybucket is used for decision making. Under certain conditions, enriched events are poured into these buckets. When these buckets are full, we raise a new event. After this event is raised the bucket is destroyed. There are many types of buckets, and we welcome any new useful design of buckets.

Usually, the bucket configuration generates the creation of many buckets. They are differentiated by a field called stackkey. When two events arrive with the same stackkey they go in the same matching bucket.

The very purpose of these buckets is to detect clients that exceed a certain rate of attempts to do something (ssh connection, http authentication failure, etc...). Thus, the most used stackkey field is often the source_ip.

Standard leaky buckets

Default buckets have two main configuration options:

capacity: number of events the bucket can hold. When the capacity is reached and a new event is poured, a new event is raised. We call this type of event overflow. This is an int.
leakspeed: duration needed for an event to leak. When an event leaks, it disappears from the bucket.

Trigger

A Trigger is a special type of bucket with a capacity of zero. Thus, when an event is poured into a trigger, it always raises an overflow.

Uniq

A Uniq is a bucket working like the standard leaky bucket except for one thing: a filter returns a property for each event and only one occurrence of this property is allowed in the bucket, thus the bucket is called uniq.

Counter

A Counter is a special type of bucket with an infinite capacity and an infinite leakspeed (it never overflows, nor leaks). Nevertheless, the event is raised after a fixed duration. The option is called duration.

Bayesian

A Bayesian is a special bucket that runs bayesian inference instead of counting events. Each event must have its likelihoods specified in the yaml file under prob_given_benign and prob_given_evil. The bucket will continue evaluating events until the posterior goes above the threshold (triggering the overflow) or the duration (specified by leakspeed) expires.

Available configuration options for buckets

Fields for standard buckets

type: mandatory field. Must be one of "leaky", "trigger", "uniq" or "counter"
name: mandatory field, but the value is totally open. Nevertheless, this value will tag the events raised by the bucket.
filter: mandatory field. It's a filter that is run to decide whether an event matches the bucket or not. The filter has to return a boolean. As a filter implementation we use https://github.com/antonmedv/expr
capacity: [mandatory for now, shouldn't be mandatory in the final version] it's the size of the bucket. When pouring in a bucket already with size events, it overflows.
leakspeed: leakspeed is a time duration (it has to be parsed by https://golang.org/pkg/time/#ParseDuration). After each interval, an event is leaked from the bucket.
stackkey: mandatory field. This field is used to differentiate on which instance of the bucket the matching events will be poured. When an unknown stackkey is seen in an event, a new bucket is created.
on_overflow: optional field, that tells what to do when the bucket is returning the overflow event. As of today, the possibilities are "ban,1h", "Reprocess" or "Delete". Reprocess is used to send the raised event back to the event pool to be matched against buckets

Fields for special buckets

Uniq

uniq_filter: an expression that must comply with the syntax defined in https://github.com/antonmedv/expr and must return a string. All strings returned by this filter in the same buckets have to be different. Thus if a string is seen twice, the event is dismissed.

Trigger

Capacity and leakspeed are not relevant for this kind of bucket.

Counter

duration: the Counter will be destroyed after this interval has elapsed since its creation. The duration must be parsed by https://golang.org/pkg/time/#ParseDuration. Nevertheless, this kind of bucket is often used with an infinite leakspeed and an infinite capacity [capacity set to -1 for now].

Bayesian

bayesian_prior: The prior to start with
bayesian_threshold: The threshold for the posterior to trigger the overflow.
bayesian_conditions: List of Bayesian conditions with likelihoods

Bayesian Conditions are built from:

condition: The expr for this specific condition to be true
prob_given_evil: The likelihood an IP satisfies the condition given the fact that it is a maliscious IP
prob_given_benign: The likelihood an IP satisfies the condition given the fact that it is a benign IP
guillotine: Bool to stop the condition from getting evaluated if it has evaluated to true once. This should be used if evaluating the condition is computationally expensive.

Add examples here

# ssh bruteforce
- type: leaky
  name: ssh_bruteforce
  filter: "Meta.log_type == 'ssh_failed-auth'"
  leakspeed: "10s"
  capacity: 5
  stackkey: "source_ip"
  on_overflow: ban,1h

# reporting of src_ip,dest_port seen
- type: counter
  name: counter
  filter: "Meta.service == 'tcp' && Event.new_connection == 'true'"
  distinct: "Meta.source_ip + ':' + Meta.dest_port"
  duration: 5m
  capacity: -1

- type: trigger
  name: "New connection"
  filter: "Meta.service == 'tcp' && Event.new_connection == 'true'"
  on_overflow: Reprocess

Note on leakybuckets implementation

[This is not dry enough to have many details here, but:]

The bucket code is triggered by runPour in pour.go, by calling the leaky.PourItemToHolders function. There is one struct called buckets which is for now a map[string]interface{} that holds all buckets. The key of this map is derived from the filter configured for the bucket and its stackkey. This looks complicated, but it allows us to use only one struct. This is done in buckets.go.

On top of that the implementation defines only the standard leaky bucket. A goroutine is launched for every bucket (bucket.go). This goroutine manages the life of the bucket.

For special buckets, hooks are defined at initialization time in manager.go. Hooks are called when relevant by the bucket goroutine when events are poured and/or when a bucket overflows.