Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 80 additions & 2 deletions _data-prepper/pipelines/contains.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
For example, if you want to check if the string `"abcd"` is contained within the value of a field named `message`, you can use the `contains()` function as follows:

```
contains('/message', 'abcd')
'contains(/message, "abcd")'
```
{% include copy.html %}

Expand All @@ -27,11 +27,89 @@
Alternatively, you can use a literal string as the first argument:

```
contains('This is a test message', 'test')
'contains("This is a test message", "test")'
```
{% include copy.html %}

In this case, the function returns `true` because the substring `test` is present within the string `This is a test message`.

The `contains()` function performs a case-sensitive search.
{: .note}

## Example

The following pipeline uses `contains()` to add a boolean flag `has_test` based on a substring in `/message` and to filter out non-matching events, forwarding only messages containing "ERROR" to OpenSearch:

Check failure on line 41 in _data-prepper/pipelines/contains.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Boolean' instead of 'boolean'. Raw Output: {"message": "[Vale.Terms] Use 'Boolean' instead of 'boolean'.", "location": {"path": "_data-prepper/pipelines/contains.md", "range": {"start": {"line": 41, "column": 51}}}, "severity": "ERROR"}

```yaml
contains-demo-pipeline:
source:
http:
ssl: false

processor:
- add_entries:
entries:
- key: "has_test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove unnecessary quotes. I suggest this because the abundance of quotes we already have in our documentation is leading to double quotes on the expressions. As you've probably already noticed, this makes it hard to write expressions.

value_expression: 'contains(/message, "test")'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for using single quotes here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable its actually not necessary, I've removed them now

- drop_events:
drop_when: 'not contains(/message, "ERROR")'

sink:
- opensearch:
hosts: ["https://opensearch:9200"]
insecure: true
username: admin
password: "admin_pass"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the double quotes.

index_type: custom
index: "demo-index-%{yyyy.MM.dd}"
```
{% include copy.html %}
You can test the pipeline using the following command:
```bash
curl -sS -X POST "http://localhost:2021/events" \
-H "Content-Type: application/json" \
-d '[
{"message":"ok hello"},
{"message":"this has test but ok"},
{"message":"ERROR: something bad"},
{"message":"ERROR: unit test failed"}
]'
```
{% include copy.html %}

The documents stored in OpenSearch contain the following information:

```json
{
...
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "demo-index-2025.10.21",
"_id": "5YACB5oBqZitdAAb4n3r",
"_score": 1,
"_source": {
"message": "ERROR: something bad",
"has_test": false
}
},
{
"_index": "demo-index-2025.10.21",
"_id": "5oACB5oBqZitdAAb4n3r",
"_score": 1,
"_source": {
"message": "ERROR: unit test failed",
"has_test": true
}
}
]
}
}
```
Loading