-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DOCS] Refactor script processor docs #72691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
96b8340
4786e9e
9a1f779
324fcdd
d44570e
0d43f48
7827639
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,99 +4,152 @@ | |
| <titleabbrev>Script</titleabbrev> | ||
| ++++ | ||
|
|
||
| Allows inline and stored scripts to be executed within ingest pipelines. | ||
| Runs an inline or stored <<modules-scripting,script>> on incoming documents. The | ||
| script runs in the {painless}/painless-ingest-processor-context.html[`ingest`] | ||
| context. | ||
|
|
||
| See <<modules-scripting-using, How to use scripts>> to learn more about writing scripts. The Script Processor | ||
| leverages caching of compiled scripts for improved performance. Since the | ||
| script specified within the processor is potentially re-compiled per document, it is important | ||
| to understand how script caching works. To learn more about | ||
| caching see <<scripts-and-search-speed, Script Caching>>. | ||
| The script processor uses the <<scripts-and-search-speed,script cache>> to avoid | ||
| recompiling the script for each incoming document. To improve performance, | ||
| ensure the script cache is properly sized before using a script processor in | ||
| production. | ||
|
|
||
| [[script-options]] | ||
| .Script Options | ||
| .Script options | ||
| [options="header"] | ||
| |====== | ||
| | Name | Required | Default | Description | ||
| | `lang` | no | "painless" | The scripting language | ||
| | `id` | no | - | The stored script id to refer to | ||
| | `source` | no | - | An inline script to be executed | ||
| | `params` | no | - | Script Parameters | ||
| | Name | Required | Default | Description | ||
| | `lang` | no | "painless" | <<scripting-available-languages,Script language>>. | ||
| | `id` | no | - | ID of a <<create-stored-script-api,stored script>>. | ||
| If no `source` is specified, this parameter is required. | ||
| | `source` | no | - | Inline script. | ||
| If no `id` is specified, this parameter is required. | ||
| | `params` | no | - | Object containing parameters for the script. | ||
| include::common-options.asciidoc[] | ||
| |====== | ||
|
|
||
| One of `id` or `source` options must be provided in order to properly reference a script to execute. | ||
| [discrete] | ||
| [[script-processor-access-source-fields]] | ||
| ==== Access source fields | ||
|
|
||
| You can access the current ingest document from within the script context by using the `ctx` variable. | ||
| To access an incoming document's source fields with a Painless script, use the | ||
| `ctx.<field>` syntax. The `ctx._source.<field>` and `ctx['_source.<field>']` | ||
|
||
| syntaxes are not supported. | ||
|
|
||
| The following example sets a new field called `field_a_plus_b_times_c` to be the sum of two existing | ||
| numeric fields `field_a` and `field_b` multiplied by the parameter param_c: | ||
| The following processor uses a Painless script to extract the `tags` field from | ||
| the `env` source field. | ||
|
|
||
| [source,js] | ||
| -------------------------------------------------- | ||
| [source,console] | ||
| ---- | ||
| POST _ingest/pipeline/_simulate | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we explain what the simulate query is anywhere?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's covered in the ingest pipeline docs. I don't feel we need to cover that here. |
||
| { | ||
| "script": { | ||
| "lang": "painless", | ||
| "source": "ctx.field_a_plus_b_times_c = (ctx.field_a + ctx.field_b) * params.param_c", | ||
| "params": { | ||
| "param_c": 10 | ||
| "pipeline": { | ||
| "processors": [ | ||
| { | ||
| "script": { | ||
| "description": "Extract 'tags' from 'env' field", | ||
| "lang": "painless", | ||
| "source": """ | ||
| String[] envSplit = ctx?.env.splitOnToken(params.delimiter); | ||
|
||
| ArrayList tags = new ArrayList(); | ||
| tags.add(envSplit[params.position].trim()); | ||
| ctx.tags = tags; | ||
| """, | ||
| "params": { | ||
| "delimiter": "-", | ||
| "position": 1 | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| }, | ||
| "docs": [ | ||
| { | ||
| "_source": { | ||
| "env": "es01-prod" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| // NOTCONSOLE | ||
| ---- | ||
|
|
||
| It is possible to use the Script Processor to manipulate document metadata like `_index` during | ||
| ingestion. Here is an example of an Ingest Pipeline that renames the index to `my-index` no matter what | ||
| was provided in the original index request: | ||
| The processor produces: | ||
|
|
||
| [source,console] | ||
| -------------------------------------------------- | ||
| PUT _ingest/pipeline/my-index | ||
| [source,console-result] | ||
| ---- | ||
| { | ||
| "description": "use index:my-index", | ||
| "processors": [ | ||
| "docs": [ | ||
| { | ||
| "script": { | ||
| "source": """ | ||
| ctx._index = 'my-index'; | ||
| """ | ||
| "doc": { | ||
| ... | ||
| "_source": { | ||
| "env": "es01-prod", | ||
| "tags": [ | ||
| "prod" | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| ---- | ||
| // TESTRESPONSE[s/\.\.\./"_index":"_index","_id":"_id","_ingest":{"timestamp":$body.docs.0.doc._ingest.timestamp},/] | ||
|
|
||
|
|
||
| Using the above pipeline, we can attempt to index a document into the `any-index` index. | ||
| [discrete] | ||
| [[script-processor-access-metadata-fields]] | ||
| ==== Access metadata fields | ||
|
|
||
| You can also use a script processor to access metadata fields. The following | ||
| processor uses a Painless script to set an incoming document's `_index`. | ||
|
|
||
| [source,console] | ||
| -------------------------------------------------- | ||
| PUT any-index/_doc/1?pipeline=my-index | ||
| ---- | ||
| POST _ingest/pipeline/_simulate | ||
| { | ||
| "message": "text" | ||
| "pipeline": { | ||
| "processors": [ | ||
| { | ||
| "script": { | ||
| "description": "Set index based on `lang` field and `dataset` param", | ||
| "lang": "painless", | ||
| "source": """ | ||
| ctx._index = ctx.lang + '-' + params.dataset; | ||
| """, | ||
| "params": { | ||
| "dataset": "catalog" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| }, | ||
| "docs": [ | ||
| { | ||
| "_index": "generic-index", | ||
| "_source": { | ||
| "lang": "fr" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| // TEST[continued] | ||
| ---- | ||
|
|
||
| The response from the above index request: | ||
| The processor changes the document's `_index` to `fr-catalog` from | ||
| `generic-index`. | ||
|
|
||
| [source,console-result] | ||
| -------------------------------------------------- | ||
| ---- | ||
| { | ||
| "_index": "my-index", | ||
| "_id": "1", | ||
| "_version": 1, | ||
| "result": "created", | ||
| "_shards": { | ||
| "total": 2, | ||
| "successful": 1, | ||
| "failed": 0 | ||
| }, | ||
| "_seq_no": 89, | ||
| "_primary_term": 1, | ||
| "docs": [ | ||
| { | ||
| "doc": { | ||
| ... | ||
| "_index": "fr-catalog", | ||
| "_source": { | ||
| "lang": "fr" | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| -------------------------------------------------- | ||
| // TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/] | ||
|
|
||
| In the above response, you can see that our document was actually indexed into `my-index` instead of | ||
| `any-index`. This type of manipulation is often convenient in pipelines that have various branches of transformation, | ||
| and depending on the progress made, indexed into different indices. | ||
| ---- | ||
| // TESTRESPONSE[s/\.\.\./"_id":"_id","_ingest":{"timestamp":$body.docs.0.doc._ingest.timestamp},/] | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing to have this table say 'no' to all parameters in an either/or situation. One of id/source is required, and while this is specified as part of the description, is there a way we could change the no to a '*' or some other indicator that at least one of these is required because it's easy to miss on quick glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this isn't ideal, but it's consistent with the other ingest processors. Changing this would break the later
include::common-options.asciidoc[]statement, which is used in all our processor reference docs.I've opened #72717 to address your concerns as a separate effort.