-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[alerts] add executionStatus to event log doc for action execute #82401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c356721 to
3ca7c23
Compare
eb34d38 to
52bb9c0
Compare
pmuellr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted a couple of anomolies in the current tests
05759b9 to
1822d4d
Compare
resolves elastic#79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.
1822d4d to
a4599f9
Compare
|
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
|
Wanted to point out some additional issues to open after working with this code; unrelated to this PR directly, but straight-forward and we should clean these up.
|
| expect(call.services).toBeTruthy(); | ||
|
|
||
| const eventLogger = taskRunnerFactoryInitializerParams.eventLogger; | ||
| expect(eventLogger.logEvent).toHaveBeenCalledTimes(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes at the top of this file are a dog's breakfast, sorry. The order the events get logged has changed, where the execute event is written AFTER all the other events now, but post-dated to the start of the execution process (and will be <= the other events generated), since we now have to write it AFTER all the execution processing. So, the diff is not very pleasing ...
| executionHandler: ReturnType<typeof createExecutionHandler>, | ||
| spaceId: string | ||
| spaceId: string, | ||
| event: Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now pass a partially populated event in to these "inner" functions of the execution logic, as we write the event at the "top level" of the execution logic, instead of in this "inner" one.
| alertRawInstances, | ||
| (rawAlertInstance) => new AlertInstance(rawAlertInstance) | ||
| ); | ||
| const originalAlertInstances = cloneDeep(alertInstances); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need the last set of alert instances (from the previous execution), so we can get the action group for "resolved" events, not just the original instance ids. Also, need to clone this, as alertInstances is mutated during the processing.
| "end": { | ||
| "type": "date" | ||
| }, | ||
| "outcome": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are the two fields from the alert execution status, that weren't already populated somehow in the event, so added as new fields, in what I think are appropriate places - reason is a pre-existing. ECS field, but status is a new field in the custom ECS kibana.alertingfield
| event: { provider: 'test-provider', action: 'a' }, | ||
| }); | ||
|
|
||
| const ignoredTimestamp = '1999-01-01T00:00:00Z'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are now "post-dating" the execute event log docs, we need to allow the timestamp to be overridable when we merge fixed and initial event fields - previously you could NOT override it. That logic is in the file below, x-pack/plugins/event_log/server/event_logger.ts
| @@ -0,0 +1,84 @@ | |||
| /* | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to test a decrypt error status in the event log, we need to run with security on - all the other eventLog-specific tests are in the spaces_only tests, so we reference some of the utilities used there via import.
| expect(event?.kibana?.alerting?.action_group_id).to.be(actionGroupId); | ||
| } | ||
|
|
||
| function validateEvent(event: IValidatedEvent, params: ValidateEventLogParams): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this diff section, best viewed with ignore whitespace, as I lifted some functions out of a function scope, to make them top-level, so they could be exported, so they could be reused by the security_and_spaces tests
|
I've added a number of comments to the PR as guide posts - lots of stuff that seems a bit obscure in this PR :-) |
ymao1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Verified that for event log where action = execute, there is now a status field with executionStatus and that if the status is error, a reason is populated.
dhurley14
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is awesome thanks very much! Tested it out locally I'm hoping to switch over from using our rule status saved object to just reading errors through the getAlertInstanceSummary on the alertsClient. Looks like that will work. This will really help detections clean things up in our own codebase.
YulNaumenko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
…stic#82401) resolves elastic#79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.
* master: [Advanced Settings] Introducing telemetry (elastic#82860) [alerts] add executionStatus to event log doc for action execute (elastic#82401) Add additional sources routes (elastic#83227) [ML] Persisted URL state for the "Anomaly detection jobs" page (elastic#83149) [Logs UI] Add pagination to the log stream shared component (elastic#81193) [Index Management] Add an index template link to data stream details (elastic#82592) Add maps_oss folder to code_owners (elastic#83204) fix truncation issue (elastic#83000) [Ingest Manger] Move asset getters out of registry (elastic#83214)
) (#83289) resolves #79785 Until now, the execution status was available in the the event log document for the execute action. In this PR we add it. The event log is extended to add the following fields: - `kibana.alerting.status` - from executionStatus.status - `event.reason` - from executionStatus.error.reason The date from the executionStatus and start date in the event log will be set to the same value. Previously, errors encountered while trying to execute an alert executor, eg decrypting the alert, would not end up with an event doc generated. Now they will. In addition, there were a few places where events that could have had the action group in them did not, and one where the instance id was undefined - those were fixed up.
… alerts/action-groups-as-conditions * origin/alerts/stack-alerts-public: (91 commits) removed import from plugin code as it causes FTR to fail [Advanced Settings] Introducing telemetry (elastic#82860) [alerts] add executionStatus to event log doc for action execute (elastic#82401) Add additional sources routes (elastic#83227) [ML] Persisted URL state for the "Anomaly detection jobs" page (elastic#83149) [Logs UI] Add pagination to the log stream shared component (elastic#81193) [Index Management] Add an index template link to data stream details (elastic#82592) Add maps_oss folder to code_owners (elastic#83204) fix truncation issue (elastic#83000) [Ingest Manger] Move asset getters out of registry (elastic#83214) make defaulted field non maybe Remove unused asciidoc file (elastic#83228) [Lens] Remove background from lens embeddable (elastic#83061) [Discover] Unskip flaky tests based on discover fixture index pattern (elastic#82991) Removing unnecessary trailing slash in CODEOWNERS Trying to fix CODEOWNERS again, where was a non-existent team prior (elastic#83236) Trying to fix CODEOWERS, missing a starting slash (elastic#83233) skip flaky suite (elastic#83231) Add enzyme rerender test helper (elastic#83208) Move Elasticsearch type definitions out of APM (elastic#83081) ...
Summary
resolves #79785
Until now, the execution status was not available in the the event
log document for the execute action. In this PR we add it.
The event log is extended to add the following fields:
kibana.alerting.status- from executionStatus.statusevent.reason- from executionStatus.error.reasonThe date from the executionStatus and start date in the event
log will be set to the same value.
Previously, errors encountered before trying to execute an
alert executor, eg decrypting the alert, would not end up
with an event doc generated. Now they will.
Checklist