Explorar o código

feat(config): custom weight thresholds via CEL (#688)

* feat(config): add Thresholds to the top level config file

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(config): make String() on ExpressionOrList join the component expressions

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(config): ensure unparseable json fails

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(config): if no thresholds are set, use the default thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(policy): half implement thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(policy): continue wiring things up

Signed-off-by: Xe Iaso <me@xeiaso.net>

* feat(lib): wire up thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* test(lib): handle behavior from legacy configurations

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: document thresholds

Signed-off-by: Xe Iaso <me@xeiaso.net>

* docs: update CHANGELOG, refer to threshold configuration

Signed-off-by: Xe Iaso <me@xeiaso.net>

* fix(lib): fix build

Signed-off-by: Xe Iaso <me@xeiaso.net>

* chore(lib): fix U1000

Signed-off-by: Xe Iaso <me@xeiaso.net>

---------

Signed-off-by: Xe Iaso <me@xeiaso.net>
Signed-off-by: Jason Cameron <git@jasoncameron.dev>
Co-authored-by: Jason Cameron <git@jasoncameron.dev>
Xe Iaso hai 5 días
pai
achega
226cf36bf7

+ 54 - 0
data/botPolicies.yaml

@@ -91,3 +91,57 @@ dnsbl: false
 status_codes:
   CHALLENGE: 200
   DENY: 200
+
+# The weight thresholds for when to trigger individual challenges. Any
+# CHALLENGE will take precedence over this.
+#
+# A threshold has four configuration options:
+#
+#   - name: the name that is reported down the stack and used for metrics
+#   - expression: A CEL expression with the request weight in the variable
+#     weight
+#   - action: the Anubis action to apply, similar to in a bot policy
+#   - challenge: which challenge to send to the user, similar to in a bot policy
+#
+# See https://anubis.techaro.lol/docs/admin/configuration/thresholds for more
+# information.
+thresholds:
+  # By default Anubis ships with the following thresholds:
+  - name: minimal-suspicion # This client is likely fine, its soul is lighter than a feather
+    expression: weight < 0 # a feather weighs zero units
+    action: ALLOW # Allow the traffic through
+  # For clients that had some weight reduced through custom rules, give them a
+  # lightweight challenge.
+  - name: mild-suspicion
+    expression:
+      all:
+        - weight >= 0
+        - weight < 10
+    action: CHALLENGE
+    challenge:
+      # https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh
+      algorithm: metarefresh
+      difficulty: 1
+      report_as: 1
+  # For clients that are browser-like but have either gained points from custom rules or
+  # report as a standard browser.
+  - name: moderate-suspicion
+    expression:
+      all:
+        - weight >= 10
+        - weight < 20
+    action: CHALLENGE
+    challenge:
+      # https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
+      algorithm: fast
+      difficulty: 2 # two leading zeros, very fast for most clients
+      report_as: 2
+  # For clients that are browser like and have gained many points from custom rules
+  - name: extreme-suspicion
+    expression: weight >= 20
+    action: CHALLENGE
+    challenge:
+      # https://anubis.techaro.lol/docs/admin/configuration/challenges/proof-of-work
+      algorithm: fast
+      difficulty: 4
+      report_as: 4

+ 1 - 0
docs/docs/CHANGELOG.md

@@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Add `robots2policy` CLI utility to convert robots.txt files to Anubis challenge policies using CEL expressions ([#409](https://github.com/TecharoHQ/anubis/issues/409))
 - Implement GeoIP and ASN based checks via [Thoth](https://anubis.techaro.lol/docs/admin/thoth) ([#206](https://github.com/TecharoHQ/anubis/issues/206))
 - Replace internal SHA256 hashing with xxhash for 4-6x performance improvement in policy evaluation and cache operations
+- Add [custom weight thresholds](./admin/configuration/thresholds.mdx) via CEL ([#688](https://github.com/TecharoHQ/anubis/pull/688))
 
 ## v1.19.1: Jenomis cen Lexentale - Echo 1
 

+ 140 - 0
docs/docs/admin/configuration/thresholds.mdx

@@ -0,0 +1,140 @@
+# Weight Threshold Configuration
+
+Anubis offers the ability to assign "weight" to requests. This is a custom level of suspicion that rules can add to or remove from. For example, here's how you assign 10 weight points to anything that might be a browser:
+
+```yaml
+# botPolicies.yaml
+
+bots:
+  - name: generic-browser
+    user_agent_regex: >-
+      Mozilla|Opera
+    action: WEIGH
+    weight:
+      adjust: 10
+```
+
+Thresholds let you take this per-request weight value and take actions in response to it. Thresholds are defined alongside your bot configuration in `botPolicies.yaml`.
+
+:::note
+
+Thresholds DO NOT apply when a request matches a bot rule with the CHALLENGE action. Thresholds only apply when requests don't match any terminal bot rules.
+
+:::
+
+```yaml
+# botPolicies.yaml
+
+bots: ...
+
+thresholds:
+  - name: minimal-suspicion
+    expression: weight < 0
+    action: ALLOW
+
+  - name: mild-suspicion
+    expression:
+      all:
+        - weight >= 0
+        - weight < 10
+    action: CHALLENGE
+    challenge:
+      algorithm: metarefresh
+      difficulty: 1
+      report_as: 1
+
+  - name: moderate-suspicion
+    expression:
+      all:
+        - weight >= 10
+        - weight < 20
+    action: CHALLENGE
+    challenge:
+      algorithm: fast
+      difficulty: 2
+      report_as: 2
+
+  - name: extreme-suspicion
+    expression: weight >= 20
+    action: CHALLENGE
+    challenge:
+      algorithm: fast
+      difficulty: 4
+      report_as: 4
+```
+
+This defines a suite of 4 thresholds:
+
+1. If the request weight is less than zero, allow it through.
+2. If the request weight is greater than or equal to zero, but less than ten: give it [a very lightweight challenge](./challenges/metarefresh.mdx).
+3. If the request weight is greater than or equal to ten, but less than twenty: give it [a slightly heavier challenge](./challenges/proof-of-work.mdx).
+4. Otherwise, give it [the heaviest challenge](./challenges/proof-of-work.mdx).
+
+Thresholds can be configured with the following options:
+
+<table>
+  <thead>
+  <tr>
+    <th>Name</th>
+    <th>Description</th>
+    <th>Example</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>`name`</td>
+      <td>The human-readable name for this threshold.</td>
+      <td>
+
+```yaml
+name: extreme-suspicion
+```
+
+      </td>
+    </tr>
+    <tr>
+    <td>`expression`</td>
+    <td>A [CEL](https://cel.dev/) expression taking the request weight and returning true or false</td>
+    <td>
+
+To check if the request weight is less than zero:
+
+```yaml
+expression: weight < 0
+```
+
+To check if it's between 0 and 10 (inclusive):
+
+```yaml
+expression:
+  all:
+    - weight >= 0
+    - weight < 10
+```
+
+    </td>
+    </tr>
+    <tr>
+    <td>`action`</td>
+    <td>The Anubis action to apply: `ALLOW`, `CHALLENGE`, or `DENY`</td>
+    <td>
+
+```yaml
+action: ALLOW
+```
+
+If you set the CHALLENGE action, you must set challenge details:
+
+```yaml
+action: CHALLENGE
+challenge:
+  algorithm: metarefresh
+  difficulty: 1
+  report_as: 1
+```
+
+    </td>
+    </tr>
+
+  </tbody>
+</table>

+ 2 - 8
docs/docs/admin/policies.mdx

@@ -261,17 +261,11 @@ Anubis rules can also add or remove "weight" from requests, allowing administrat
     adjust: -5
 ```
 
-This would remove five weight points from the request, making Anubis present the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx).
+This would remove five weight points from the request, which would make Anubis present the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx) in the default configuration.
 
 ### Weight Thresholds
 
-Weight thresholds and challenge associations will be configurable with CEL expressions in the configuration file in an upcoming patch, for now here's how Anubis configures the weight thresholds:
-
-|                                      Weight Expression | Action                                                                                                                                 |
-| -----------------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------- |
-|                   `weight < 0` (weight is less than 0) | Allow the request through.                                                                                                             |
-|                 `weight < 10` (weight is less than 10) | Challenge the client with the [Meta Refresh challenge](./configuration/challenges/metarefresh.mdx) at the default difficulty level.    |
-| `weight >= 10` (weight is greater than or equal to 10) | Challenge the client with the [Proof of Work challenge](./configuration/challenges/proof-of-work.mdx) at the default difficulty level. |
+For more information on configuring weight thresholds, see [Weight Threshold Configuration](./configuration/thresholds.mdx)
 
 ### Advice
 

+ 20 - 34
lib/anubis.go

@@ -15,6 +15,7 @@ import (
 	"time"
 
 	"github.com/golang-jwt/jwt/v5"
+	"github.com/google/cel-go/common/types"
 
 	"github.com/prometheus/client_golang/prometheus"
 	"github.com/prometheus/client_golang/prometheus/promauto"
@@ -411,12 +412,6 @@ func cr(name string, rule config.Rule, weight int) policy.CheckResult {
 	}
 }
 
-var (
-	weightOkayStatic    = policy.NewStaticHashChecker("weight/okay")
-	weightMildSusStatic = policy.NewStaticHashChecker("weight/mild-suspicion")
-	weightVerySusStatic = policy.NewStaticHashChecker("weight/extreme-suspicion")
-)
-
 // Check evaluates the list of rules, and returns the result
 func (s *Server) check(r *http.Request) (policy.CheckResult, *policy.Bot, error) {
 	host := r.Header.Get("X-Real-Ip")
@@ -448,34 +443,25 @@ func (s *Server) check(r *http.Request) (policy.CheckResult, *policy.Bot, error)
 		}
 	}
 
-	switch {
-	case weight <= 0:
-		return cr("weight/okay", config.RuleAllow, weight), &policy.Bot{
-			Challenge: &config.ChallengeRules{
-				Difficulty: s.policy.DefaultDifficulty,
-				ReportAs:   s.policy.DefaultDifficulty,
-				Algorithm:  config.DefaultAlgorithm,
-			},
-			Rules: weightOkayStatic,
-		}, nil
-	case weight > 0 && weight < 10:
-		return cr("weight/mild-suspicion", config.RuleChallenge, weight), &policy.Bot{
-			Challenge: &config.ChallengeRules{
-				Difficulty: s.policy.DefaultDifficulty,
-				ReportAs:   s.policy.DefaultDifficulty,
-				Algorithm:  "metarefresh",
-			},
-			Rules: weightMildSusStatic,
-		}, nil
-	case weight >= 10:
-		return cr("weight/extreme-suspicion", config.RuleChallenge, weight), &policy.Bot{
-			Challenge: &config.ChallengeRules{
-				Difficulty: s.policy.DefaultDifficulty,
-				ReportAs:   s.policy.DefaultDifficulty,
-				Algorithm:  "fast",
-			},
-			Rules: weightVerySusStatic,
-		}, nil
+	for _, t := range s.policy.Thresholds {
+		result, _, err := t.Program.ContextEval(r.Context(), &policy.ThresholdRequest{Weight: weight})
+		if err != nil {
+			slog.Error("error when evaluating threshold expression", "expression", t.Expression.String(), "err", err)
+			continue
+		}
+
+		var matches bool
+
+		if val, ok := result.(types.Bool); ok {
+			matches = bool(val)
+		}
+
+		if matches {
+			return cr("threshold/"+t.Name, t.Action, weight), &policy.Bot{
+				Challenge: t.Challenge,
+				Rules:     &checker.List{},
+			}, nil
+		}
 	}
 
 	return cr("default/allow", config.RuleAllow, weight), &policy.Bot{

+ 3 - 32
lib/policy/celchecker.go

@@ -17,47 +17,18 @@ type CELChecker struct {
 }
 
 func NewCELChecker(cfg *config.ExpressionOrList) (*CELChecker, error) {
-	env, err := expressions.NewEnvironment()
+	env, err := expressions.BotEnvironment()
 	if err != nil {
 		return nil, err
 	}
 
-	var src string
-	var ast *cel.Ast
-
-	if cfg.Expression != "" {
-		src = cfg.Expression
-		var iss *cel.Issues
-		intermediate, iss := env.Compile(src)
-		if iss != nil {
-			return nil, iss.Err()
-		}
-
-		ast, iss = env.Check(intermediate)
-		if iss != nil {
-			return nil, iss.Err()
-		}
-	}
-
-	if len(cfg.All) != 0 {
-		ast, err = expressions.Join(env, expressions.JoinAnd, cfg.All...)
-	}
-
-	if len(cfg.Any) != 0 {
-		ast, err = expressions.Join(env, expressions.JoinOr, cfg.Any...)
-	}
-
-	if err != nil {
-		return nil, err
-	}
-
-	program, err := expressions.Compile(env, ast)
+	program, err := expressions.Compile(env, cfg.String())
 	if err != nil {
 		return nil, fmt.Errorf("can't compile CEL program: %w", err)
 	}
 
 	return &CELChecker{
-		src:     src,
+		src:     cfg.String(),
 		program: program,
 	}, nil
 }

+ 47 - 9
lib/policy/config/config.go

@@ -43,6 +43,15 @@ const (
 	RuleBenchmark Rule = "DEBUG_BENCHMARK"
 )
 
+func (r Rule) Valid() error {
+	switch r {
+	case RuleAllow, RuleDeny, RuleChallenge, RuleWeigh, RuleBenchmark:
+		return nil
+	default:
+		return ErrUnknownAction
+	}
+}
+
 const DefaultAlgorithm = "fast"
 
 type BotConfig struct {
@@ -184,13 +193,18 @@ type ChallengeRules struct {
 }
 
 var (
-	ErrChallengeDifficultyTooLow  = errors.New("config.Bot.ChallengeRules: difficulty is too low (must be >= 1)")
-	ErrChallengeDifficultyTooHigh = errors.New("config.Bot.ChallengeRules: difficulty is too high (must be <= 64)")
+	ErrChallengeDifficultyTooLow  = errors.New("config.ChallengeRules: difficulty is too low (must be >= 1)")
+	ErrChallengeDifficultyTooHigh = errors.New("config.ChallengeRules: difficulty is too high (must be <= 64)")
+	ErrChallengeMustHaveAlgorithm = errors.New("config.ChallengeRules: must have algorithm name set")
 )
 
 func (cr ChallengeRules) Valid() error {
 	var errs []error
 
+	if cr.Algorithm == "" {
+		errs = append(errs, ErrChallengeMustHaveAlgorithm)
+	}
+
 	if cr.Difficulty < 1 {
 		errs = append(errs, fmt.Errorf("%w, got: %d", ErrChallengeDifficultyTooLow, cr.Difficulty))
 	}
@@ -312,18 +326,19 @@ type fileConfig struct {
 	Bots        []BotOrImport `json:"bots"`
 	DNSBL       bool          `json:"dnsbl"`
 	StatusCodes StatusCodes   `json:"status_codes"`
+	Thresholds  []Threshold   `json:"threshold"`
 }
 
-func (c fileConfig) Valid() error {
+func (c *fileConfig) Valid() error {
 	var errs []error
 
 	if len(c.Bots) == 0 {
 		errs = append(errs, ErrNoBotRulesDefined)
 	}
 
-	for _, b := range c.Bots {
+	for i, b := range c.Bots {
 		if err := b.Valid(); err != nil {
-			errs = append(errs, err)
+			errs = append(errs, fmt.Errorf("bot %d: %w", i, err))
 		}
 	}
 
@@ -331,6 +346,16 @@ func (c fileConfig) Valid() error {
 		errs = append(errs, err)
 	}
 
+	if len(c.Thresholds) == 0 {
+		errs = append(errs, ErrNoThresholdRulesDefined)
+	}
+
+	for i, t := range c.Thresholds {
+		if err := t.Valid(); err != nil {
+			errs = append(errs, fmt.Errorf("threshold %d: %w", i, err))
+		}
+	}
+
 	if len(errs) != 0 {
 		return fmt.Errorf("config is not valid:\n%w", errors.Join(errs...))
 	}
@@ -339,11 +364,14 @@ func (c fileConfig) Valid() error {
 }
 
 func Load(fin io.Reader, fname string) (*Config, error) {
-	var c fileConfig
-	c.StatusCodes = StatusCodes{
-		Challenge: http.StatusOK,
-		Deny:      http.StatusOK,
+	c := &fileConfig{
+		StatusCodes: StatusCodes{
+			Challenge: http.StatusOK,
+			Deny:      http.StatusOK,
+		},
+		Thresholds: DefaultThresholds,
 	}
+
 	if err := yaml.NewYAMLToJSONDecoder(fin).Decode(&c); err != nil {
 		return nil, fmt.Errorf("can't parse policy config YAML %s: %w", fname, err)
 	}
@@ -379,6 +407,15 @@ func Load(fin io.Reader, fname string) (*Config, error) {
 		}
 	}
 
+	for _, t := range c.Thresholds {
+		if err := t.Valid(); err != nil {
+			validationErrs = append(validationErrs, err)
+			continue
+		}
+
+		result.Thresholds = append(result.Thresholds, t)
+	}
+
 	if len(validationErrs) > 0 {
 		return nil, fmt.Errorf("errors validating policy config %s: %w", fname, errors.Join(validationErrs...))
 	}
@@ -388,6 +425,7 @@ func Load(fin io.Reader, fname string) (*Config, error) {
 
 type Config struct {
 	Bots        []BotConfig
+	Thresholds  []Threshold
 	DNSBL       bool
 	StatusCodes StatusCodes
 }

+ 2 - 7
lib/policy/config/config_test.go

@@ -8,7 +8,6 @@ import (
 	"testing"
 
 	"github.com/TecharoHQ/anubis/data"
-	"k8s.io/apimachinery/pkg/util/yaml"
 )
 
 func p[V any](v V) *V { return &v }
@@ -313,12 +312,8 @@ func TestConfigValidBad(t *testing.T) {
 			}
 			defer fin.Close()
 
-			var c fileConfig
-			if err := yaml.NewYAMLToJSONDecoder(fin).Decode(&c); err != nil {
-				t.Fatalf("can't decode file: %v", err)
-			}
-
-			if err := c.Valid(); err == nil {
+			_, err = Load(fin, filepath.Join("testdata", "bad", st.Name()))
+			if err == nil {
 				t.Fatal("validation should have failed but didn't somehow")
 			} else {
 				t.Log(err)

+ 28 - 0
lib/policy/config/expressionorlist.go

@@ -3,7 +3,9 @@ package config
 import (
 	"encoding/json"
 	"errors"
+	"fmt"
 	"slices"
+	"strings"
 )
 
 var (
@@ -18,6 +20,32 @@ type ExpressionOrList struct {
 	Any        []string `json:"any,omitempty" yaml:"any,omitempty"`
 }
 
+func (eol ExpressionOrList) String() string {
+	switch {
+	case len(eol.Expression) != 0:
+		return eol.Expression
+	case len(eol.All) != 0:
+		var sb strings.Builder
+		for i, pred := range eol.All {
+			if i != 0 {
+				fmt.Fprintf(&sb, " && ")
+			}
+			fmt.Fprintf(&sb, "( %s )", pred)
+		}
+		return sb.String()
+	case len(eol.Any) != 0:
+		var sb strings.Builder
+		for i, pred := range eol.Any {
+			if i != 0 {
+				fmt.Fprintf(&sb, " || ")
+			}
+			fmt.Fprintf(&sb, "( %s )", pred)
+		}
+		return sb.String()
+	}
+	panic("this should not happen")
+}
+
 func (eol ExpressionOrList) Equal(rhs *ExpressionOrList) bool {
 	if eol.Expression != rhs.Expression {
 		return false

+ 51 - 0
lib/policy/config/expressionorlist_test.go

@@ -213,3 +213,54 @@ func TestExpressionOrListUnmarshalJSON(t *testing.T) {
 		})
 	}
 }
+
+func TestExpressionOrListString(t *testing.T) {
+	for _, tt := range []struct {
+		name string
+		in   ExpressionOrList
+		out  string
+	}{
+		{
+			name: "single expression",
+			in: ExpressionOrList{
+				Expression: "true",
+			},
+			out: "true",
+		},
+		{
+			name: "all",
+			in: ExpressionOrList{
+				All: []string{"true"},
+			},
+			out: "( true )",
+		},
+		{
+			name: "all with &&",
+			in: ExpressionOrList{
+				All: []string{"true", "true"},
+			},
+			out: "( true ) && ( true )",
+		},
+		{
+			name: "any",
+			in: ExpressionOrList{
+				All: []string{"true"},
+			},
+			out: "( true )",
+		},
+		{
+			name: "any with ||",
+			in: ExpressionOrList{
+				Any: []string{"true", "true"},
+			},
+			out: "( true ) || ( true )",
+		},
+	} {
+		t.Run(tt.name, func(t *testing.T) {
+			result := tt.in.String()
+			if result != tt.out {
+				t.Errorf("wanted %q, got: %q", tt.out, result)
+			}
+		})
+	}
+}

+ 1 - 0
lib/policy/config/testdata/bad/unparseable.json

@@ -0,0 +1 @@
+}

+ 1 - 0
lib/policy/config/testdata/bad/unparseable.yaml

@@ -0,0 +1 @@
+}

+ 8 - 0
lib/policy/config/testdata/good/no-thresholds.yaml

@@ -0,0 +1,8 @@
+bots:
+  - name: simple-weight-adjust
+    action: WEIGH
+    user_agent_regex: Mozilla
+    weight:
+      adjust: 5
+
+thresholds: []

+ 38 - 0
lib/policy/config/testdata/good/thresholds.yaml

@@ -0,0 +1,38 @@
+bots:
+  - name: simple-weight-adjust
+    action: WEIGH
+    user_agent_regex: Mozilla
+    weight:
+      adjust: 5
+
+thresholds:
+  - name: minimal-suspicion
+    expression: weight < 0
+    action: ALLOW
+  - name: mild-suspicion
+    expression:
+      all:
+        - weight >= 0
+        - weight < 10
+    action: CHALLENGE
+    challenge:
+      algorithm: metarefresh
+      difficulty: 1
+      report_as: 1
+  - name: moderate-suspicion
+    expression:
+      all:
+        - weight >= 10
+        - weight < 20
+    action: CHALLENGE
+    challenge:
+      algorithm: fast
+      difficulty: 2
+      report_as: 2
+  - name: extreme-suspicion
+    expression: weight >= 20
+    action: CHALLENGE
+    challenge:
+      algorithm: fast
+      difficulty: 4
+      report_as: 4

+ 80 - 0
lib/policy/config/threshold.go

@@ -0,0 +1,80 @@
+package config
+
+import (
+	"errors"
+	"fmt"
+
+	"github.com/TecharoHQ/anubis"
+)
+
+var (
+	ErrNoThresholdRulesDefined             = errors.New("config: no thresholds defined")
+	ErrThresholdMustHaveName               = errors.New("config.Threshold: must set name")
+	ErrThresholdMustHaveExpression         = errors.New("config.Threshold: must set expression")
+	ErrThresholdChallengeMustHaveChallenge = errors.New("config.Threshold: a threshold with the CHALLENGE action must have challenge set")
+	ErrThresholdCannotHaveWeighAction      = errors.New("config.Threshold: a threshold cannot have the WEIGH action")
+
+	DefaultThresholds = []Threshold{
+		{
+			Name: "legacy-anubis-behaviour",
+			Expression: &ExpressionOrList{
+				Expression: "weight > 0",
+			},
+			Action: RuleChallenge,
+			Challenge: &ChallengeRules{
+				Algorithm:  "fast",
+				Difficulty: anubis.DefaultDifficulty,
+				ReportAs:   anubis.DefaultDifficulty,
+			},
+		},
+	}
+)
+
+type Threshold struct {
+	Name       string            `json:"name" yaml:"name"`
+	Expression *ExpressionOrList `json:"expression" yaml:"expression"`
+	Action     Rule              `json:"action" yaml:"action"`
+	Challenge  *ChallengeRules   `json:"challenge" yaml:"challenge"`
+}
+
+func (t Threshold) Valid() error {
+	var errs []error
+
+	if len(t.Name) == 0 {
+		errs = append(errs, ErrThresholdMustHaveName)
+	}
+
+	if t.Expression == nil {
+		errs = append(errs, ErrThresholdMustHaveExpression)
+	}
+
+	if t.Expression != nil {
+		if err := t.Expression.Valid(); err != nil {
+			errs = append(errs, err)
+		}
+	}
+
+	if err := t.Action.Valid(); err != nil {
+		errs = append(errs, err)
+	}
+
+	if t.Action == RuleWeigh {
+		errs = append(errs, ErrThresholdCannotHaveWeighAction)
+	}
+
+	if t.Action == RuleChallenge && t.Challenge == nil {
+		errs = append(errs, ErrThresholdChallengeMustHaveChallenge)
+	}
+
+	if t.Challenge != nil {
+		if err := t.Challenge.Valid(); err != nil {
+			errs = append(errs, err)
+		}
+	}
+
+	if len(errs) != 0 {
+		return fmt.Errorf("config: threshold entry for %q is not valid:\n%w", t.Name, errors.Join(errs...))
+	}
+
+	return nil
+}

+ 92 - 0
lib/policy/config/threshold_test.go

@@ -0,0 +1,92 @@
+package config
+
+import (
+	"errors"
+	"fmt"
+	"testing"
+)
+
+func TestThresholdValid(t *testing.T) {
+	for _, tt := range []struct {
+		name  string
+		input *Threshold
+		err   error
+	}{
+		{
+			name: "basic allow",
+			input: &Threshold{
+				Name:       "basic-allow",
+				Expression: &ExpressionOrList{Expression: "true"},
+				Action:     RuleAllow,
+			},
+			err: nil,
+		},
+		{
+			name: "basic challenge",
+			input: &Threshold{
+				Name:       "basic-challenge",
+				Expression: &ExpressionOrList{Expression: "true"},
+				Action:     RuleChallenge,
+				Challenge: &ChallengeRules{
+					Algorithm:  "fast",
+					Difficulty: 1,
+					ReportAs:   1,
+				},
+			},
+			err: nil,
+		},
+		{
+			name:  "no name",
+			input: &Threshold{},
+			err:   ErrThresholdMustHaveName,
+		},
+		{
+			name:  "no expression",
+			input: &Threshold{},
+			err:   ErrThresholdMustHaveName,
+		},
+		{
+			name: "invalid expression",
+			input: &Threshold{
+				Expression: &ExpressionOrList{},
+			},
+			err: ErrExpressionEmpty,
+		},
+		{
+			name:  "invalid action",
+			input: &Threshold{},
+			err:   ErrUnknownAction,
+		},
+		{
+			name: "challenge action but no challenge",
+			input: &Threshold{
+				Action: RuleChallenge,
+			},
+			err: ErrThresholdChallengeMustHaveChallenge,
+		},
+		{
+			name: "challenge invalid",
+			input: &Threshold{
+				Action:    RuleChallenge,
+				Challenge: &ChallengeRules{Difficulty: 0, ReportAs: 0},
+			},
+			err: ErrChallengeDifficultyTooLow,
+		},
+	} {
+		t.Run(tt.name, func(t *testing.T) {
+			if err := tt.input.Valid(); !errors.Is(err, tt.err) {
+				t.Errorf("threshold is invalid: %v", err)
+			}
+		})
+	}
+}
+
+func TestDefaultThresholdsValid(t *testing.T) {
+	for i, th := range DefaultThresholds {
+		t.Run(fmt.Sprintf("%d %s", i, th.Name), func(t *testing.T) {
+			if err := th.Valid(); err != nil {
+				t.Errorf("threshold invalid: %v", err)
+			}
+		})
+	}
+}

+ 38 - 14
lib/policy/expressions/environment.go

@@ -9,20 +9,12 @@ import (
 	"github.com/google/cel-go/ext"
 )
 
-// NewEnvironment creates a new CEL environment, this is the set of
+// BotEnvironment creates a new CEL environment, this is the set of
 // variables and functions that are passed into the CEL scope so that
 // Anubis can fail loudly and early when something is invalid instead
 // of blowing up at runtime.
-func NewEnvironment() (*cel.Env, error) {
-	return cel.NewEnv(
-		ext.Strings(
-			ext.StringsLocale("en_US"),
-			ext.StringsValidateFormatCalls(true),
-		),
-
-		// default all timestamps to UTC
-		cel.DefaultUTCTimeZone(true),
-
+func BotEnvironment() (*cel.Env, error) {
+	return New(
 		// Variables exposed to CEL programs:
 		cel.Variable("remoteAddress", cel.StringType),
 		cel.Variable("host", cel.StringType),
@@ -31,8 +23,27 @@ func NewEnvironment() (*cel.Env, error) {
 		cel.Variable("path", cel.StringType),
 		cel.Variable("query", cel.MapType(cel.StringType, cel.StringType)),
 		cel.Variable("headers", cel.MapType(cel.StringType, cel.StringType)),
+	)
+}
+
+// NewThreshold creates a new CEL environment for threshold checking.
+func ThresholdEnvironment() (*cel.Env, error) {
+	return New(
+		cel.Variable("weight", cel.IntType),
+	)
+}
 
-		// Functions exposed to CEL programs:
+func New(opts ...cel.EnvOption) (*cel.Env, error) {
+	args := []cel.EnvOption{
+		ext.Strings(
+			ext.StringsLocale("en_US"),
+			ext.StringsValidateFormatCalls(true),
+		),
+
+		// default all timestamps to UTC
+		cel.DefaultUTCTimeZone(true),
+
+		// Functions exposed to all CEL programs:
 		cel.Function("randInt",
 			cel.Overload("randInt_int",
 				[]*cel.Type{cel.IntType},
@@ -47,12 +58,25 @@ func NewEnvironment() (*cel.Env, error) {
 				}),
 			),
 		),
-	)
+	}
+
+	args = append(args, opts...)
+	return cel.NewEnv(args...)
 }
 
 // Compile takes CEL environment and syntax tree then emits an optimized
 // Program for execution.
-func Compile(env *cel.Env, ast *cel.Ast) (cel.Program, error) {
+func Compile(env *cel.Env, src string) (cel.Program, error) {
+	intermediate, iss := env.Compile(src)
+	if iss != nil {
+		return nil, iss.Err()
+	}
+
+	ast, iss := env.Check(intermediate)
+	if iss != nil {
+		return nil, iss.Err()
+	}
+
 	return env.Program(
 		ast,
 		cel.EvalOptions(

+ 0 - 104
lib/policy/expressions/join.go

@@ -1,104 +0,0 @@
-package expressions
-
-import (
-	"errors"
-	"fmt"
-	"strings"
-
-	"github.com/google/cel-go/cel"
-)
-
-// JoinOperator is a type wrapper for and/or operators.
-//
-// This is a separate type so that validation can be done at the type level.
-type JoinOperator string
-
-// Possible values for JoinOperator
-const (
-	JoinAnd JoinOperator = "&&"
-	JoinOr  JoinOperator = "||"
-)
-
-// Valid ensures that JoinOperator is semantically valid.
-func (jo JoinOperator) Valid() error {
-	switch jo {
-	case JoinAnd, JoinOr:
-		return nil
-	default:
-		return ErrWrongJoinOperator
-	}
-}
-
-var (
-	ErrWrongJoinOperator = errors.New("expressions: invalid join operator")
-	ErrNoExpressions     = errors.New("expressions: cannot join zero expressions")
-	ErrCantCompile       = errors.New("expressions: can't compile one expression")
-)
-
-// JoinClauses joins a list of compiled clauses into one big if statement.
-//
-// Imagine the following two clauses:
-//
-//	ball.color == "red"
-//	ball.shape == "round"
-//
-// JoinClauses would emit one "joined" clause such as:
-//
-//	( ball.color == "red" ) && ( ball.shape == "round" )
-func JoinClauses(env *cel.Env, operator JoinOperator, clauses ...*cel.Ast) (*cel.Ast, error) {
-	if err := operator.Valid(); err != nil {
-		return nil, fmt.Errorf("%w: wanted && or ||, got: %q", err, operator)
-	}
-
-	switch len(clauses) {
-	case 0:
-		return nil, ErrNoExpressions
-	case 1:
-		return clauses[0], nil
-	}
-
-	var exprs []string
-	var errs []error
-
-	for _, clause := range clauses {
-		clauseStr, err := cel.AstToString(clause)
-		if err != nil {
-			errs = append(errs, err)
-			continue
-		}
-
-		exprs = append(exprs, "( "+clauseStr+" )")
-	}
-
-	if len(errs) != 0 {
-		return nil, fmt.Errorf("errors while decompiling statements: %w", errors.Join(errs...))
-	}
-
-	statement := strings.Join(exprs, " "+string(operator)+" ")
-	result, iss := env.Compile(statement)
-	if iss != nil {
-		return nil, iss.Err()
-	}
-
-	return result, nil
-}
-
-func Join(env *cel.Env, operator JoinOperator, clauses ...string) (*cel.Ast, error) {
-	var statements []*cel.Ast
-	var errs []error
-
-	for _, clause := range clauses {
-		stmt, iss := env.Compile(clause)
-		if iss != nil && iss.Err() != nil {
-			errs = append(errs, fmt.Errorf("%w: %q gave: %w", ErrCantCompile, clause, iss.Err()))
-			continue
-		}
-		statements = append(statements, stmt)
-	}
-
-	if len(errs) != 0 {
-		return nil, fmt.Errorf("errors while joining clauses: %w", errors.Join(errs...))
-	}
-
-	return JoinClauses(env, operator, statements...)
-}

+ 0 - 90
lib/policy/expressions/join_test.go

@@ -1,90 +0,0 @@
-package expressions
-
-import (
-	"errors"
-	"testing"
-
-	"github.com/google/cel-go/cel"
-)
-
-func TestJoin(t *testing.T) {
-	env, err := NewEnvironment()
-	if err != nil {
-		t.Fatal(err)
-	}
-
-	for _, tt := range []struct {
-		err       error
-		name      string
-		op        JoinOperator
-		resultStr string
-		clauses   []string
-	}{
-		{
-			name:    "no-clauses",
-			clauses: []string{},
-			op:      JoinAnd,
-			err:     ErrNoExpressions,
-		},
-		{
-			name:      "one-clause-identity",
-			clauses:   []string{`remoteAddress == "8.8.8.8"`},
-			op:        JoinAnd,
-			err:       nil,
-			resultStr: `remoteAddress == "8.8.8.8"`,
-		},
-		{
-			name: "multi-clause-and",
-			clauses: []string{
-				`remoteAddress == "8.8.8.8"`,
-				`host == "anubis.techaro.lol"`,
-			},
-			op:        JoinAnd,
-			err:       nil,
-			resultStr: `remoteAddress == "8.8.8.8" && host == "anubis.techaro.lol"`,
-		},
-		{
-			name: "multi-clause-or",
-			clauses: []string{
-				`remoteAddress == "8.8.8.8"`,
-				`host == "anubis.techaro.lol"`,
-			},
-			op:        JoinOr,
-			err:       nil,
-			resultStr: `remoteAddress == "8.8.8.8" || host == "anubis.techaro.lol"`,
-		},
-		{
-			name: "git-user-agent",
-			clauses: []string{
-				`userAgent.startsWith("git/") || userAgent.contains("libgit")`,
-				`"Git-Protocol" in headers && headers["Git-Protocol"] == "version=2"`,
-			},
-			op:  JoinAnd,
-			err: nil,
-			resultStr: `(userAgent.startsWith("git/") || userAgent.contains("libgit")) && "Git-Protocol" in headers &&
-headers["Git-Protocol"] == "version=2"`,
-		},
-	} {
-		t.Run(tt.name, func(t *testing.T) {
-			result, err := Join(env, tt.op, tt.clauses...)
-			if !errors.Is(err, tt.err) {
-				t.Errorf("wanted error %v but got: %v", tt.err, err)
-			}
-
-			if tt.err != nil {
-				return
-			}
-
-			program, err := cel.AstToString(result)
-			if err != nil {
-				t.Fatalf("can't decompile program: %v", err)
-			}
-
-			if tt.resultStr != program {
-				t.Logf("wanted: %s", tt.resultStr)
-				t.Logf("got: %s", program)
-				t.Error("program did not compile as expected")
-			}
-		})
-	}
-}

+ 23 - 0
lib/policy/policy.go

@@ -6,6 +6,7 @@ import (
 	"fmt"
 	"io"
 	"log/slog"
+	"sync/atomic"
 
 	"github.com/TecharoHQ/anubis/internal/thoth"
 	"github.com/TecharoHQ/anubis/lib/policy/checker"
@@ -21,12 +22,14 @@ var (
 	}, []string{"rule", "action"})
 
 	ErrChallengeRuleHasWrongAlgorithm = errors.New("config.Bot.ChallengeRules: algorithm is invalid")
+	warnedAboutThresholds             = &atomic.Bool{}
 )
 
 type ParsedConfig struct {
 	orig *config.Config
 
 	Bots              []Bot
+	Thresholds        []*Threshold
 	DNSBL             bool
 	DefaultDifficulty int
 	StatusCodes       config.StatusCodes
@@ -150,6 +153,26 @@ func ParseConfig(ctx context.Context, fin io.Reader, fname string, defaultDiffic
 		result.Bots = append(result.Bots, parsedBot)
 	}
 
+	for _, t := range c.Thresholds {
+		if t.Name == "legacy-anubis-behaviour" && t.Expression.String() == "true" {
+			if !warnedAboutThresholds.Load() {
+				slog.Warn("configuration file does not contain thresholds, see docs for details on how to upgrade", "fname", fname, "docs_url", "https://anubis.techaro.lol/docs/admin/configuration/thresholds/")
+				warnedAboutThresholds.Store(true)
+			}
+
+			t.Challenge.Difficulty = defaultDifficulty
+			t.Challenge.ReportAs = defaultDifficulty
+		}
+
+		threshold, err := ParsedThresholdFromConfig(t)
+		if err != nil {
+			validationErrs = append(validationErrs, fmt.Errorf("can't compile threshold config for %s: %w", t.Name, err))
+			continue
+		}
+
+		result.Thresholds = append(result.Thresholds, threshold)
+	}
+
 	if len(validationErrs) > 0 {
 		return nil, fmt.Errorf("errors validating policy config JSON %s: %w", fname, errors.Join(validationErrs...))
 	}

+ 47 - 0
lib/policy/thresholds.go

@@ -0,0 +1,47 @@
+package policy
+
+import (
+	"github.com/TecharoHQ/anubis/lib/policy/config"
+	"github.com/TecharoHQ/anubis/lib/policy/expressions"
+	"github.com/google/cel-go/cel"
+)
+
+type Threshold struct {
+	config.Threshold
+	Program cel.Program
+}
+
+func ParsedThresholdFromConfig(t config.Threshold) (*Threshold, error) {
+	result := &Threshold{
+		Threshold: t,
+	}
+
+	env, err := expressions.ThresholdEnvironment()
+	if err != nil {
+		return nil, err
+	}
+
+	program, err := expressions.Compile(env, t.Expression.String())
+	if err != nil {
+		return nil, err
+	}
+
+	result.Program = program
+
+	return result, nil
+}
+
+type ThresholdRequest struct {
+	Weight int
+}
+
+func (tr *ThresholdRequest) Parent() cel.Activation { return nil }
+
+func (tr *ThresholdRequest) ResolveName(name string) (any, bool) {
+	switch name {
+	case "weight":
+		return tr.Weight, true
+	default:
+		return nil, false
+	}
+}

+ 7 - 7
lib/testdata/aggressive_403.yaml

@@ -1,12 +1,12 @@
 bots:
-- name: deny
-  user_agent_regex: DENY
-  action: DENY
+  - name: deny
+    user_agent_regex: DENY
+    action: DENY
 
-- name: challenge
-  user_agent_regex: CHALLENGE
-  action: CHALLENGE
+  - name: challenge
+    user_agent_regex: CHALLENGE
+    action: CHALLENGE
 
 status_codes:
   CHALLENGE: 401
-  DENY: 403
+  DENY: 403