## Understanding parsers Parsers are configurations that define a transformation on an {{event.htmlname}}. Parsers are expressed as YAML files composed of one or more individual 'parsing' nodes. An {{event.htmlname}} can be the representation of a log line, or an overflow. A parser itself can be used to perform various actions, including : - Parse a string with regular expression (grok patterns) - Enrich an event by relying on "external" code (such as the geoip-enrichment parser) - Process one or more fields of an {{event.name}} with {{expr.htmlname}} A parser node might look like : ```yaml #if 'onsuccess' is 'next_stage', the event will make it to next stage if this node succeed onsuccess: next_stage #a 'debug' (bool) flag allow to enable node level debug in any node to enable local debug debug: true #a filter to decide if the Event is elligible for this parser node filter: "evt.Parsed.program == 'kernel'" #a unique name to allow easy debug & logging name: crowdsecurity/demo-iptables #this is for humans description: "Parse iptables drop logs" #we can define named capture groups (a-la-grok) pattern_syntax: MYCAP: ".*" #an actual grok pattern (regular expression with named capture groupe) grok: pattern: ^xxheader %{MYCAP:extracted_value} trailing stuff$ #we define on which field the regular expression must be applied apply_on: evt.Parsed.some_field #statics are transformations that are applied on the event if the node is considered "successfull" statics: #to which field the value will be written (here -> evt.Meta.log_type) - meta: log_type #and here a static value value: parsed_testlog #another one - meta: source_ip #here the value stored is the result of a dynamic expression expression: "evt.Parsed.src_ip" ``` The parser nodes are processed sequentially based on the alphabetical order of {{stages.htmlname}} and subsequent files. If the node is considered successful (grok is present and returned data or no grok is present) and "onsuccess" equals to `next_stage`, then the {{event.name}} is moved to the next stage. ## Parser trees A parser node can contain sub-nodes, to provide proper branching. It can be useful when you want to apply different parsing based on different criterias, or when you have a set of candidates parsers that you want to apply to an event : ```yaml #This first node will capture/extract some value filter: "evt.Line.Labels.type == 'type1'" name: tests/base-grok-root pattern_syntax: MYCAP: ".*" grok: pattern: ^... %{MYCAP:extracted_value} ...$ apply_on: Line.Raw statics: - meta: state value: root-done - meta: state_sub expression: evt.Parsed.extracted_value --- #and this node will apply different patterns to it filter: "evt.Line.Labels.type == 'type1' && evt.Meta.state == 'root-done'" name: tests/base-grok-leafs onsuccess: next_stage #the sub-nodes will process the result of the master node nodes: - filter: "evt.Parsed.extracted_value == 'VALUE1'" debug: true statics: - meta: final_state value: leaf1 - filter: "evt.Parsed.extracted_value == 'VALUE2'" debug: true statics: - meta: final_state value: leaf2 ``` The logic is that the `tests/base-grok-root` node will be processed first and will alter the event (here mostly by extracting some text from the `Line.Raw` field into `Parsed` thanks to the `grok` pattern and the `statics` directive). The event will then continue its life and be parsed by the the following `tests/base-grok-leafs` node. This node has `onsuccess` set to `next_stage` which means that if the node is successful, the event will be moved to the next stage. This node consists actually of two sub-nodes that have different conditions (branching) to allow differential treatment of said event. A real-life example can be seen when it comes to parsing HTTP logs. HTTP ACCESS and ERROR logs often have different formats, and thus our "nginx" parser needs to handle both formats ```yaml filter: "evt.Parsed.program == 'nginx'" onsuccess: next_stage name: crowdsecurity/nginx-logs nodes: - grok: #this is the access log name: NGINXACCESS apply_on: message statics: - meta: log_type value: http_access-log - target: evt.StrTime expression: evt.Parsed.time_local - grok: # and this one the error log name: NGINXERROR apply_on: message statics: - meta: log_type value: http_error-log - target: evt.StrTime expression: evt.Parsed.time # these ones apply for both grok patterns statics: - meta: service value: http - meta: source_ip expression: "evt.Parsed.remote_addr" - meta: http_status expression: "evt.Parsed.status" - meta: http_path expression: "evt.Parsed.request" ``` ## Parser directives ### debug ```yaml debug: true|false ``` _default: false_ If set to to `true`, enabled node level debugging. It is meant to help understanding parser node behaviour by providing contextual logging. ### filter ```yaml filter: expression ``` `filter` must be a valid {{expr.htmlname}} expression that will be evaluated against the {{event.name}}. If `filter` evaluation returns true or is absent, node will be processed. If `filter` returns `false` or a non-boolean, node won't be processed. Examples : - `filter: "evt.Meta.foo == 'test'"` - `filter: "evt.Meta.bar == 'test' && evt.Meta.foo == 'test2'` ### grok ```yaml grok: name: NAMED_EXISTING_PATTERN apply_on: source_field ``` ```yaml grok: pattern: ^a valid RE2 expression with %{CAPTURE:field}$ apply_on: source_field ``` The `grok` structure in a node represent a regular expression with capture group (grok pattern) that must be applied on a field of {{event.name}}. The pattern can : - be imported by name (if present within the core of {{crowdsec.name}}) - defined in place In both case, the pattern must be a valid RE2 expression. The field(s) returned by the regular expression are going to be merged into the `Parsed` associative array of the `Event`. ### name ```yaml name: explicit_string ``` The *mandatory* name of the node. If not present, node will be skipped at runtime. It is used for example in debug log to help you track things. ### nodes ```yaml nodes: - filter: ... grok: ... ``` `nodes` is a list of parser nodes, allowing you to build trees. Each subnode must be valid, and if any of the subnodes succeed, the whole node is considered successful. ### onsuccess ``` onsuccess: next_stage|continue ``` _default: continue_ if set to `next_stage` and the node is considered successful, the {{event.name}} will be moved directly to next stage without processing other nodes in the current stage. ### pattern_syntax ```yaml pattern_syntax: CAPTURE_NAME: VALID_RE2_EXPRESSION ``` `pattern_syntax` allows user to define named capture group expressions for future use in grok patterns. Regexp must be a valid RE2 expression. ```yaml pattern_syntax: MYCAP: ".*" grok: pattern: ^xxheader %{MYCAP:extracted_value} trailing stuff$ apply_on: Line.Raw ``` ### statics ```yaml statics: - target: evt.Meta.target_field value: static_value - meta: target_field expression: evt.Meta.target_field + ' this_is' + ' a dynamic expression' - enriched: target_field value: static_value ``` `statics` is a list of directives that will be executed when the node is considered successful. Each entry of the list is composed of a target (where to write) and a source (what data to write). **Target** The target aims at being any part of the {{event.htmlname}} object, and can be expressed in different ways : - `meta: ` - `parsed: ` - `enriched: ` - a dynamic target (please note that the **current** event is accessible via the `evt.` variable) : - `target: evt.Meta.foobar` - `target: Meta.foobar` - `target: evt.StrTime` **Source** The source itself can be either a static value, or an {{expr.htmlname}} result : ```yaml statics: - meta: target_field value: static_value - meta: target_field expression: evt.Meta.another_field - meta: target_field expression: evt.Meta.target_field + ' this_is' + ' a dynamic expression' ``` ## Parser concepts ### Success and failure A parser is considered "successful" if : - A grok pattern was present and successfully matched - No grok pattern was present