-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DOCS] Refactor script processor docs #72691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jrodewig
merged 7 commits into
elastic:master
from
jrodewig:docs__fix-script-processor-docs
May 4, 2021
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
96b8340
[DOCS] Refactor script processor docs
jrodewig 4786e9e
Note that `ctx._source` is not supported
jrodewig 9a1f779
Address review feedback
jrodewig 324fcdd
Fix typos
jrodewig d44570e
Fix link
jrodewig 0d43f48
Fix test
jrodewig 7827639
Reword
jrodewig File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,99 +4,158 @@ | |
| <titleabbrev>Script</titleabbrev> | ||
| ++++ | ||
|
|
||
| Allows inline and stored scripts to be executed within ingest pipelines. | ||
| Runs an inline or stored <<modules-scripting,script>> on incoming documents. The | ||
| script runs in the {painless}/painless-ingest-processor-context.html[`ingest`] | ||
| context. | ||
|
|
||
| See <<modules-scripting-using, How to use scripts>> to learn more about writing scripts. The Script Processor | ||
| leverages caching of compiled scripts for improved performance. Since the | ||
| script specified within the processor is potentially re-compiled per document, it is important | ||
| to understand how script caching works. To learn more about | ||
| caching see <<scripts-and-search-speed, Script Caching>>. | ||
| The script processor uses the <<scripts-and-search-speed,script cache>> to avoid | ||
| recompiling the script for each incoming document. To improve performance, | ||
| ensure the script cache is properly sized before using a script processor in | ||
| production. | ||
|
|
||
| [[script-options]] | ||
| .Script Options | ||
| .Script options | ||
| [options="header"] | ||
| |====== | ||
| | Name | Required | Default | Description | ||
| | `lang` | no | "painless" | The scripting language | ||
| | `id` | no | - | The stored script id to refer to | ||
| | `source` | no | - | An inline script to be executed | ||
| | `params` | no | - | Script Parameters | ||
| | Name | Required | Default | Description | ||
| | `lang` | no | "painless" | <<scripting-available-languages,Script language>>. | ||
| | `id` | no | - | ID of a <<create-stored-script-api,stored script>>. | ||
| If no `source` is specified, this parameter is required. | ||
| | `source` | no | - | Inline script. | ||
| If no `id` is specified, this parameter is required. | ||
| | `params` | no | - | Object containing parameters for the script. | ||
| include::common-options.asciidoc[] | ||
| |====== | ||
|
|
||
| One of `id` or `source` options must be provided in order to properly reference a script to execute. | ||
| [discrete] | ||
| [[script-processor-access-source-fields]] | ||
| ==== Access source fields | ||
|
|
||
| You can access the current ingest document from within the script context by using the `ctx` variable. | ||
| The script processor parses each incoming document's JSON source fields into a | ||
| set of maps, lists, and primitives. To access these fields with a Painless | ||
| script, use the | ||
| {painless}/painless-operators-reference.html#map-access-operator[map access | ||
| operator]: `ctx['my-field']`. You can also use the shorthand `ctx.<my-field>` | ||
| syntax. | ||
|
|
||
| The following example sets a new field called `field_a_plus_b_times_c` to be the sum of two existing | ||
| numeric fields `field_a` and `field_b` multiplied by the parameter param_c: | ||
| NOTE: The script processor does not support the `ctx['_source']['my-field']` or | ||
| `ctx._source.<my-field>` syntaxes. | ||
|
|
||
| [source,js] | ||
| -------------------------------------------------- | ||
| The following processor uses a Painless script to extract the `tags` field from | ||
| the `env` source field. | ||
|
|
||
| [source,console] | ||
| ---- | ||
| POST _ingest/pipeline/_simulate | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we explain what the simulate query is anywhere?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's covered in the ingest pipeline docs. I don't feel we need to cover that here. |
||
| { | ||
| "script": { | ||
| "lang": "painless", | ||
| "source": "ctx.field_a_plus_b_times_c = (ctx.field_a + ctx.field_b) * params.param_c", | ||
| "params": { | ||
| "param_c": 10 | ||
| "pipeline": { | ||
| "processors": [ | ||
| { | ||
| "script": { | ||
| "description": "Extract 'tags' from 'env' field", | ||
| "lang": "painless", | ||
| "source": """ | ||
| String[] envSplit = ctx['env'].splitOnToken(params['delimiter']); | ||
| ArrayList tags = new ArrayList(); | ||
| tags.add(envSplit[params['position']].trim()); | ||
| ctx['tags'] = tags; | ||
| """, | ||
| "params": { | ||
| "delimiter": "-", | ||
| "position": 1 | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| }, | ||
| "docs": [ | ||
| { | ||
| "_source": { | ||
| "env": "es01-prod" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| // NOTCONSOLE | ||
| ---- | ||
|
|
||
| It is possible to use the Script Processor to manipulate document metadata like `_index` during | ||
| ingestion. Here is an example of an Ingest Pipeline that renames the index to `my-index` no matter what | ||
| was provided in the original index request: | ||
| The processor produces: | ||
|
|
||
| [source,console] | ||
| -------------------------------------------------- | ||
| PUT _ingest/pipeline/my-index | ||
| [source,console-result] | ||
| ---- | ||
| { | ||
| "description": "use index:my-index", | ||
| "processors": [ | ||
| "docs": [ | ||
| { | ||
| "script": { | ||
| "source": """ | ||
| ctx._index = 'my-index'; | ||
| """ | ||
| "doc": { | ||
| ... | ||
| "_source": { | ||
| "env": "es01-prod", | ||
| "tags": [ | ||
| "prod" | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| ---- | ||
| // TESTRESPONSE[s/\.\.\./"_index":"_index","_id":"_id","_ingest":{"timestamp":$body.docs.0.doc._ingest.timestamp},/] | ||
|
|
||
| Using the above pipeline, we can attempt to index a document into the `any-index` index. | ||
|
|
||
| [discrete] | ||
| [[script-processor-access-metadata-fields]] | ||
| ==== Access metadata fields | ||
|
|
||
| You can also use a script processor to access metadata fields. The following | ||
| processor uses a Painless script to set an incoming document's `_index`. | ||
|
|
||
| [source,console] | ||
| -------------------------------------------------- | ||
| PUT any-index/_doc/1?pipeline=my-index | ||
| ---- | ||
| POST _ingest/pipeline/_simulate | ||
| { | ||
| "message": "text" | ||
| "pipeline": { | ||
| "processors": [ | ||
| { | ||
| "script": { | ||
| "description": "Set index based on `lang` field and `dataset` param", | ||
| "lang": "painless", | ||
| "source": """ | ||
| ctx['_index'] = ctx['lang'] + '-' + params['dataset']; | ||
| """, | ||
| "params": { | ||
| "dataset": "catalog" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| }, | ||
| "docs": [ | ||
| { | ||
| "_index": "generic-index", | ||
| "_source": { | ||
| "lang": "fr" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| // TEST[continued] | ||
| ---- | ||
|
|
||
| The response from the above index request: | ||
| The processor changes the document's `_index` to `fr-catalog` from | ||
| `generic-index`. | ||
|
|
||
| [source,console-result] | ||
| -------------------------------------------------- | ||
| ---- | ||
| { | ||
| "_index": "my-index", | ||
| "_id": "1", | ||
| "_version": 1, | ||
| "result": "created", | ||
| "_shards": { | ||
| "total": 2, | ||
| "successful": 1, | ||
| "failed": 0 | ||
| }, | ||
| "_seq_no": 89, | ||
| "_primary_term": 1, | ||
| "docs": [ | ||
| { | ||
| "doc": { | ||
| ... | ||
| "_index": "fr-catalog", | ||
| "_source": { | ||
| "lang": "fr" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] | ||
|
|
||
| In the above response, you can see that our document was actually indexed into `my-index` instead of | ||
| `any-index`. This type of manipulation is often convenient in pipelines that have various branches of transformation, | ||
| and depending on the progress made, indexed into different indices. | ||
| ---- | ||
| // TESTRESPONSE[s/\.\.\./"_id":"_id","_ingest":{"timestamp":$body.docs.0.doc._ingest.timestamp},/] | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing to have this table say 'no' to all parameters in an either/or situation. One of id/source is required, and while this is specified as part of the description, is there a way we could change the no to a '*' or some other indicator that at least one of these is required because it's easy to miss on quick glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this isn't ideal, but it's consistent with the other ingest processors. Changing this would break the later
include::common-options.asciidoc[]statement, which is used in all our processor reference docs.I've opened #72717 to address your concerns as a separate effort.