Przeglądaj źródła

Doc : fix whitelists documentation + document `data` for parsers/scenarios + document expr helpers + link taxonomy (#126)

Thibault "bui" Koechlin 5 lat temu
rodzic
commit
a0c1ca49d0

+ 24 - 0
docs/references/parsers.md

@@ -282,6 +282,30 @@ statics:
     expression: evt.Meta.target_field + ' this_is' + ' a dynamic expression'
     expression: evt.Meta.target_field + ' this_is' + ' a dynamic expression'
 ```
 ```
 
 
+### data
+
+```
+data:
+  - source_url: https://URL/TO/FILE
+    dest_file: LOCAL_FILENAME
+    [type: regexp]
+```
+
+`data` allows user to specify an external source of data.
+This section is only relevant when `cscli` is used to install parser from hub, as it will download the `source_url` and store it to `dest_file`. When the parser is not installed from the hub, {{crowdsec.name}} won't download the URL, but the file must exist for the parser to be loaded correctly.
+
+If `type` is set to `regexp`, the content of the file must be one valid (re2) regular expression per line.
+Those regexps will be compiled and kept in cache.
+
+
+```yaml
+name: crowdsecurity/cdn-whitelist
+...
+data:
+  - source_url: https://www.cloudflare.com/ips-v4
+    dest_file: cloudflare_ips.txt
+```
+
 
 
 ## Parser concepts
 ## Parser concepts
 
 

+ 25 - 0
docs/references/scenarios.md

@@ -347,3 +347,28 @@ overflow_filter: any(queue.Queue, { .Enriched.IsInEU  == "true" })
 If this expression is present and returns false, the overflow will be discarded.
 If this expression is present and returns false, the overflow will be discarded.
 
 
 
 
+### data
+
+```
+data:
+  - source_url: https://URL/TO/FILE
+    dest_file: LOCAL_FILENAME
+    [type: regexp]
+```
+
+`data` allows user to specify an external source of data.
+This section is only relevant when `cscli` is used to install scenario from hub, as ill download the `source_url` and store it to `dest_file`. When the scenario is not installed from the hub, {{crowdsec.name}} won't download the URL, but the file must exist for the scenario to be loaded correctly.
+
+If `type` is set to `regexp`, the content of the file must be one valid (re2) regular expression per line.
+Those regexps will be compiled and kept in cache.
+
+
+```yaml
+name: crowdsecurity/cdn-whitelist
+...
+data:
+  - source_url: https://www.cloudflare.com/ips-v4
+    dest_file: cloudflare_ips.txt
+```
+
+

+ 52 - 0
docs/write_configurations/expressions.md

@@ -0,0 +1,52 @@
+# Expressions
+
+> {{expr.htmlname}} : Expression evaluation engine for Go: fast, non-Turing complete, dynamic typing, static typing
+
+
+Several places of {{crowdsec.name}}'s configuration use {{expr.htmlname}} :
+
+ - {{filter.Htmlname}} that are used to determine events eligibility in {{parsers.htmlname}} and {{scenarios.htmlname}} or `profiles`
+ - {{statics.Htmlname}} use expr in the `expression` directive, to compute complex values
+ - {{whitelists.Htmlname}} rely on `expression` directive to allow more complex whitelists filters
+
+To learn more about {{expr.htmlname}}, [check the github page of the project](https://github.com/antonmedv/expr/blob/master/docs/Language-Definition.md).
+
+In order to makes its use in {{crowdsec.name}} more efficient, we added a few helpers that are documented bellow.
+
+## Atof(string) float64
+
+Parses a string representation of a float number to an actual float number (binding on `strconv.ParseFloat`)
+
+> Atof(evt.Parsed.tcp_port)
+
+
+## JsonExtract(JsonBlob, FieldName) string
+
+Extract the `FieldName` from the `JsonBlob` and returns it as a string. (binding on [jsonparser](https://github.com/buger/jsonparser/))
+
+> JsonExtract(evt.Parsed.some_json_blob, "foo.bar[0].one_item")
+
+## File(FileName) []string
+
+Returns the content of `FileName` as an array of string, while providing cache mechanism.
+
+> evt.Parsed.some_field in File('some_patterns.txt')
+> any(File('rdns_seo_bots.txt'), { evt.Enriched.reverse_dns endsWith #})
+
+## RegexpInFile(StringToMatch, FileName) bool
+
+Returns `true` if the `StringToMatch` is matched by one of the expressions contained in `FileName` (uses RE2 regexp engine).
+
+> RegexpInFile( evt.Enriched.reverse_dns, 'my_legit_seo_whitelists.txt')
+
+## Upper(string) string
+
+Returns the uppercase version of the string
+
+> Upper("yop")
+
+## IpInRange(IPStr, RangeStr) bool
+
+Returns true if the IP `IPStr` is contained in the IP range `RangeStr` (uses `net.ParseCIDR`)
+
+> IpInRange("1.2.3.4", "1.2.3.0/24")

+ 94 - 32
docs/write_configurations/whitelist.md

@@ -1,15 +1,28 @@
-## Where are whitelists
+# What are whitelists
 
 
-Whitelists are, as for most configuration, YAML files, and allow you to "discard" signals based on :
+Whitelists are special parsers that allow you to "discard" events, and can exist at two different steps :
 
 
- - ip adress or the fact that it belongs to a specific range
- - a {{expr.name}} expression
+ - *Parser whitelists* : Allows you to discard an event at parse time, so that it never hits the buckets.
+ - *PostOverflow whitelists* : Those are whitelists that are checked *after* the overflow happens. It is usually best for whitelisting process that can be expensive (such as performing reverse DNS on an IP, or performing a `whois` of an IP).
 
 
-Here is an example :
+!!! info
+    While the whitelists are the same for parser or postoverflows, beware that field names might change.
+    Source ip is usually in `evt.Meta.source_ip` when it's a log, but `evt.Overflow.Source_ip` when it's an overflow
+
+
+The whitelist can be based on several criteria :
+
+ - specific ip address : if the event/overflow IP is the same, event is whitelisted
+ - ip ranges : if the event/overflow IP belongs to this range, event is whitelisted
+ - a list of {{expr.htmlname}} expressions : if any expression returns true, event is whitelisted
+
+Here is an example showcasing configuration :
 
 
 ```yaml
 ```yaml
 name: crowdsecurity/my-whitelists
 name: crowdsecurity/my-whitelists
 description: "Whitelist events from my ipv4 addresses"
 description: "Whitelist events from my ipv4 addresses"
+#it's a normal parser, so we can restrict its scope with filter
+filter: "1 == 1"
 whitelist:
 whitelist:
   reason: "my ipv4 ranges"
   reason: "my ipv4 ranges"
   ip: 
   ip: 
@@ -19,67 +32,75 @@ whitelist:
     - "10.0.0.0/8"
     - "10.0.0.0/8"
     - "172.16.0.0/12"
     - "172.16.0.0/12"
   expression:
   expression:
-    - "'mycorp.com' in evt.Meta.source_ip_rdns"
+  #beware, this one will work *only* if you enabled the reverse dns (crowdsecurity/rdns) enrichment postoverflow parser
+    - evt.Enriched.reverse_dns endsWith ".mycoolorg.com."
+  #this one will work *only* if you enabled the geoip (crowdsecurity/geoip-enrich) enrichment parser
+    - evt.Enriched.IsoCode == 'FR'
 ```
 ```
 
 
-## Hands on
 
 
-Let's assume we have a setup with a `crowdsecurity/base-http-scenarios` scenario enabled and no whitelists.
+# Whitelists in parsing
+
+When a whitelist is present in parsing `/etc/crowdsec/config/parsers/...`, it will be checked/discarded before being poured to any bucket. These whitelists intentionally generate no logs and are useful to discard noisy false positive sources.
+
+## Whitelist by ip
+
+Let's assume we have a setup with a `crowdsecurity/nginx` collection enabled and no whitelists.
 
 
 Thus, if I "attack" myself :
 Thus, if I "attack" myself :
 
 
 ```bash
 ```bash
-nikto -host 127.0.0.1
+nikto -host myfqdn.com
 ```
 ```
 
 
 my own IP will be flagged as being an attacker :
 my own IP will be flagged as being an attacker :
 
 
 ```bash
 ```bash
 $ tail -f /var/log/crowdsec.log 
 $ tail -f /var/log/crowdsec.log 
-time="07-05-2020 09:23:03" level=warning msg="127.0.0.1 triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-scan-uniques_404]" bucket_id=old-surf event_time="2020-05-07 09:23:03.322277347 +0200 CEST m=+57172.732939890" scenario=crowdsecurity/http-scan-uniques_404 source_ip=127.0.0.1
-time="07-05-2020 09:23:03" level=warning msg="127.0.0.1 triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-crawl-non_statics]" bucket_id=lingering-sun event_time="2020-05-07 09:23:03.345341864 +0200 CEST m=+57172.756004380" scenario=crowdsecurity/http-crawl-non_statics source_ip=127.0.0.1
+ime="07-07-2020 16:13:16" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-bad-user-agent]" bucket_id=cool-smoke event_time="2020-07-07 16:13:16.579581642 +0200 CEST m=+358819.413561109" scenario=crowdsecurity/http-bad-user-agent source_ip=80.x.x.x
+time="07-07-2020 16:13:16" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-probing]" bucket_id=green-silence event_time="2020-07-07 16:13:16.737579458 +0200 CEST m=+358819.571558901" scenario=crowdsecurity/http-probing source_ip=80.x.x.x
+time="07-07-2020 16:13:17" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-crawl-non_statics]" bucket_id=purple-snowflake event_time="2020-07-07 16:13:17.353641625 +0200 CEST m=+358820.187621068" scenario=crowdsecurity/http-crawl-non_statics source_ip=80.x.x.x
+time="07-07-2020 16:13:18" level=warning msg="80.x.x.x triggered a 4h0m0s ip ban remediation for [crowdsecurity/http-sensitive-files]" bucket_id=small-hill event_time="2020-07-07 16:13:18.005919055 +0200 CEST m=+358820.839898498" scenario=crowdsecurity/http-sensitive-files source_ip=80.x.x.x
 ^C
 ^C
 $ {{cli.bin}} ban list
 $ {{cli.bin}} ban list
-1 local decisions:
-+--------+-----------+-------------------------------------+------+--------+---------+----+--------+------------+
-| SOURCE |    IP     |               REASON                | BANS | ACTION | COUNTRY | AS | EVENTS | EXPIRATION |
-+--------+-----------+-------------------------------------+------+--------+---------+----+--------+------------+
-| local  | 127.0.0.1 | crowdsecurity/http-scan-uniques_404 |    2 | ban    |         | 0  |     47 | 3h55m57s   |
-+--------+-----------+-------------------------------------+------+--------+---------+----+--------+------------+
+4 local decisions:
++--------+---------------+-----------------------------------+------+--------+---------+---------------------------+--------+------------+
+| SOURCE |      IP       |              REASON               | BANS | ACTION | COUNTRY |            AS             | EVENTS | EXPIRATION |
++--------+---------------+-----------------------------------+------+--------+---------+---------------------------+--------+------------+
+| local  | 80.x.x.x   | crowdsecurity/http-bad-user-agent |    4 | ban    | FR      | 21502 SFR SA              |     60 | 3h59m3s    |
+...
 
 
 ```
 ```
 
 
-## Create the whitelist by IP
 
 
-Let's create a `/etc/crowdsec/crowdsec/parsers/s02-enrich/whitelists.yaml` file with the following content :
+### Create the whitelist by IP
+
+Let's create a `/etc/crowdsec/crowdsec/parsers/s02-enrich/mywhitelists.yaml` file with the following content :
 
 
 ```yaml
 ```yaml
 name: crowdsecurity/whitelists
 name: crowdsecurity/whitelists
-description: "Whitelist events from private ipv4 addresses"
+description: "Whitelist events from my ip addresses"
 whitelist:
 whitelist:
-  reason: "private ipv4 ranges"
-  ip: 
-    - "127.0.0.1"
-
+  reason: "my ip ranges"
+    ip:
+        - "80.x.x.x"
 ```
 ```
 
 
-and restart {{crowdsec.name}} : `sudo systemctl restart {{crowdsec.name}}`
+and reload {{crowdsec.name}} : `sudo systemctl restart crowdsec`
 
 
-## Test the whitelist
+### Test the whitelist
 
 
 Thus, if we restart our attack :
 Thus, if we restart our attack :
 
 
 ```bash
 ```bash
-nikto -host 127.0.0.1
+nikto -host myfqdn.com
 ```
 ```
 
 
-And we don't get bans, instead :
+And we don't get bans :
 
 
 ```bash
 ```bash
 $ tail -f /var/log/crowdsec.log  
 $ tail -f /var/log/crowdsec.log  
 ...
 ...
-time="07-05-2020 09:30:13" level=info msg="Event from [127.0.0.1] is whitelisted by Ips !" filter= name=lively-firefly stage=s02-enrich
-...
 ^C
 ^C
 $ {{cli.bin}} ban list
 $ {{cli.bin}} ban list
 No local decisions.
 No local decisions.
@@ -87,11 +108,12 @@ And 21 records from API, 15 distinct AS, 12 distinct countries
 
 
 ```
 ```
 
 
+Here, we don't get *any* logs, as the event have been discarded at parsing time.
 
 
 
 
 ## Create whitelist by expression
 ## Create whitelist by expression
 
 
-Now, let's make something more tricky : let's whitelist a **specific** user-agent (of course, it's just an example, don't do this at home !).
+Now, let's make something more tricky : let's whitelist a **specific** user-agent (of course, it's just an example, don't do this at home !). The [hub's taxonomy](https://hub.crowdsec.net/fields) will helps us to find which data is present in which field.
 
 
 Let's change our whitelist to :
 Let's change our whitelist to :
 
 
@@ -109,7 +131,7 @@ again, let's restart {{crowdsec.name}} !
 For the record, I edited nikto's configuration to use 'MySecretUserAgent' as user-agent, and thus :
 For the record, I edited nikto's configuration to use 'MySecretUserAgent' as user-agent, and thus :
 
 
 ```bash
 ```bash
-nikto -host 127.0.0.1
+nikto -host myfqdn.com
 ```
 ```
 
 
 ```bash
 ```bash
@@ -120,3 +142,43 @@ time="07-05-2020 09:39:09" level=info msg="Event is whitelisted by Expr !" filte
 ```
 ```
 
 
 
 
+# Whitelist in PostOverflows 
+
+Whitelists in PostOverflows are applied *after* the bucket overflow happens.
+It has the advantage of being triggered only once we are about to take decision about an IP or Range, and thus happens a lot less often.
+
+A good example is the [crowdsecurity/whitelist-good-actors](https://hub.crowdsec.net/author/crowdsecurity/collections/whitelist-good-actors) collection.
+
+But let's craft ours based on our previous example !
+First of all, install the [crowdsecurity/rdns postoverflow](https://hub.crowdsec.net/author/crowdsecurity/configurations/rdns) : it will be in charge of enriching overflows with reverse dns information of the offending IP.
+
+Let's put the following file in `/etc/crowdsec/config/postoverflows/s01-whitelists/mywhitelists.yaml` :
+
+```yaml
+name: me/my_cool_whitelist
+description: lets whitelist our own reverse dns
+whitelist:
+  reason: dont ban my ISP
+  expression:
+  #this is the reverse of my ip, you can get it by performing a "host" command on your public IP for example
+    - evt.Enriched.reverse_dns endsWith '.asnieres.rev.numericable.fr.'
+```
+
+After reloading {{crowdsec.name}}, and launching (again!) nikto :
+
+```bash
+nikto -host myfqdn.com
+```
+
+
+```bash
+$ tail -f /var/log/crowdsec.log
+ime="07-07-2020 17:11:09" level=info msg="Ban for 80.x.x.x whitelisted, reason [dont ban my ISP]" id=cold-sunset name=me/my_cool_whitelist stage=s01
+time="07-07-2020 17:11:09" level=info msg="node warning : no remediation" bucket_id=blue-cloud event_time="2020-07-07 17:11:09.175068053 +0200 CEST m=+2308.040825320" scenario=crowdsecurity/http-probing source_ip=80.x.x.x
+time="07-07-2020 17:11:09" level=info msg="Processing Overflow with no decisions 80.x.x.x performed 'crowdsecurity/http-probing' (11 events over 313.983994ms) at 2020-07-07 17:11:09.175068053 +0200 CEST m=+2308.040825320" bucket_id=blue-cloud event_time="2020-07-07 17:11:09.175068053 +0200 CEST m=+2308.040825320" scenario=crowdsecurity/http-probing source_ip=80.x.x.x
+...
+
+```
+
+This time, we can see that logs are being produced when the event is discarded.
+

+ 8 - 1
mkdocs.yml

@@ -17,6 +17,7 @@ nav:
   - Cheat Sheets:
   - Cheat Sheets:
     - Ban Management: cheat_sheets/ban-mgmt.md
     - Ban Management: cheat_sheets/ban-mgmt.md
     - Configuration Management: cheat_sheets/config-mgmt.md
     - Configuration Management: cheat_sheets/config-mgmt.md
+    - Hub's taxonomy: https://hub.crowdsec.net/fields
   - Observability:
   - Observability:
     - Overview: observability/overview.md
     - Overview: observability/overview.md
     - Logs: observability/logs.md
     - Logs: observability/logs.md
@@ -31,7 +32,8 @@ nav:
     - Acquisition: write_configurations/acquisition.md
     - Acquisition: write_configurations/acquisition.md
     - Parsers: write_configurations/parsers.md
     - Parsers: write_configurations/parsers.md
     - Scenarios: write_configurations/scenarios.md
     - Scenarios: write_configurations/scenarios.md
-    - Whitelist: write_configurations/whitelist.md
+    - Whitelists: write_configurations/whitelist.md
+    - Expressions: write_configurations/expressions.md
   - Blockers:
   - Blockers:
     - Overview : blockers/index.md
     - Overview : blockers/index.md
     - Nginx:
     - Nginx:
@@ -204,6 +206,11 @@ extra:
         Name: Overflow
         Name: Overflow
         htmlname: "[overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
         htmlname: "[overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
         Htmlname: "[Overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
         Htmlname: "[Overflow](/getting_started/glossary/#overflow-or-signaloccurence)"
+    whitelists:
+        name: whitelists
+        Name: Whitelists
+        htmlname: "[whitelists](/write_configurations/whitelist/)"
+        Htmlname: "[Whitelists](/write_configurations/whitelist/)"
     signal:
     signal:
         name: signal
         name: signal
         Name: Signal
         Name: Signal

+ 1 - 1
pkg/outputs/ouputs.go

@@ -176,7 +176,7 @@ func (o *Output) ProcessOutput(sig types.SignalOccurence, profiles []types.Profi
 			return err
 			return err
 		}
 		}
 		if warn != nil {
 		if warn != nil {
-			logger.Infof("node warning : %s", warn)
+			logger.Debugf("node warning : %s", warn)
 		}
 		}
 		if ordr != nil {
 		if ordr != nil {
 			bans, err := types.OrderToApplications(ordr)
 			bans, err := types.OrderToApplications(ordr)

+ 1 - 1
pkg/parser/enrich_dns.go

@@ -18,7 +18,7 @@ func reverse_dns(field string, p *types.Event, ctx interface{}) (map[string]stri
 	}
 	}
 	rets, err := net.LookupAddr(field)
 	rets, err := net.LookupAddr(field)
 	if err != nil {
 	if err != nil {
-		log.Infof("failed to resolve '%s'", field)
+		log.Debugf("failed to resolve '%s'", field)
 		return nil, nil
 		return nil, nil
 	}
 	}
 	//When using the host C library resolver, at most one result will be returned. To bypass the host resolver, use a custom Resolver.
 	//When using the host C library resolver, at most one result will be returned. To bypass the host resolver, use a custom Resolver.