-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Original comment by @sophiec20:
Found in 6.3.0 { "build" : { "hash" : "a47564f", "date" : "2018-04-09T21:45:51.347364Z" },
Showcase server, 2 node cluster on gcp
Some jobs failed to auto-close due to a timeout when trying to remove the datafeed persistent task.
System was running 6 concurrent jobs, all on gallery2018 data. These jobs were started 5s apart using the same script. All configs were identifal, except for the function used. The datafeed start request contained the same start and end dates Datafeed started (from: 1970-01-02T10:00:00.000Z to: 2018-12-31T00:00:00.000Z). The 6 jobs had the following configs:
min(bytes) partition=status influencers=clientip,status bucket=1h *
max(bytes) partition=status influencers=clientip,status bucket=1h *
mean(bytes) partition=status influencers=clientip,status bucket=1h *
median(bytes) partition=status influencers=clientip,status bucket=1h
varp(bytes) partition=status influencers=clientip,status bucket=1h
sum(bytes) partition=status influencers=clientip,status bucket=1h
An end date was specified, so I would expect the jobs to auto-close after they had finished processing the data in the index.
All jobs completed, however 3 of the jobs (see *) remained open. Their datafeeds were stopped. The errors are logged below.
[2018-04-11T23:30:34,129][INFO ][o.e.x.m.j.p.DataCountsReporter] [node2] [ga-mean-bytes-p-status-clientip-status-1h] 900000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-04-11T23:30:54,324][INFO ][o.e.x.m.j.p.DataCountsReporter] [node2] [ga-min-bytes-p-status-clientip-status-1h] 1000000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-04-11T23:31:01,912][INFO ][o.e.x.m.j.p.DataCountsReporter] [node2] [ga-sum-bytes-p-status-clientip-status-1h] 700000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-04-11T23:31:16,142][INFO ][o.e.x.m.d.DatafeedJob ] [ga-min-bytes-p-status-clientip-status-1h] Lookback has finished
[2018-04-11T23:31:16,142][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] attempt to stop datafeed [datafeed-ga-min-bytes-p-status-clientip-status-1h] for job [ga-min-bytes-p-status-clientip-status-1h]
[2018-04-11T23:31:16,143][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] try lock [20s] to stop datafeed [datafeed-ga-min-bytes-p-status-clientip-status-1h] for job [ga-min-bytes-p-status-clientip-status-1h]...
[2018-04-11T23:31:16,143][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] stopping datafeed [datafeed-ga-min-bytes-p-status-clientip-status-1h] for job [ga-min-bytes-p-status-clientip-status-1h], acquired [true]...
[2018-04-11T23:31:16,143][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] datafeed [datafeed-ga-min-bytes-p-status-clientip-status-1h] for job [ga-min-bytes-p-status-clientip-status-1h] has been stopped
[2018-04-11T23:31:16,736][INFO ][o.e.x.m.j.p.DataCountsReporter] [node2] [ga-mean-bytes-p-status-clientip-status-1h] 1000000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-04-11T23:31:31,956][INFO ][o.e.x.m.d.DatafeedJob ] [ga-mean-bytes-p-status-clientip-status-1h] Lookback has finished
[2018-04-11T23:31:31,956][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] attempt to stop datafeed [datafeed-ga-mean-bytes-p-status-clientip-status-1h] for job [ga-mean-bytes-p-status-clientip-status-1h]
[2018-04-11T23:31:31,956][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] try lock [20s] to stop datafeed [datafeed-ga-mean-bytes-p-status-clientip-status-1h] for job [ga-mean-bytes-p-status-clientip-status-1h]...
[2018-04-11T23:31:31,956][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] stopping datafeed [datafeed-ga-mean-bytes-p-status-clientip-status-1h] for job [ga-mean-bytes-p-status-clientip-status-1h], acquired [true]...
[2018-04-11T23:31:31,957][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] datafeed [datafeed-ga-mean-bytes-p-status-clientip-status-1h] for job [ga-mean-bytes-p-status-clientip-status-1h] has been stopped
[2018-04-11T23:31:36,150][ERROR][o.e.x.m.d.DatafeedManager] Failed to remove datafeed persistent task - will not auto close job [ga-min-bytes-p-status-clientip-status-1h]
java.lang.IllegalStateException: timed out after 20s
at org.elasticsearch.persistent.PersistentTasksService$WaitForPersistentTaskStatusListener.onTimeout(PersistentTasksService.java:199) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.persistent.PersistentTasksService$1.onTimeout(PersistentTasksService.java:164) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:576) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-04-11T23:31:36,154][INFO ][o.e.x.m.j.p.DataCountsReporter] [node2] [ga-sum-bytes-p-status-clientip-status-1h] 800000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-04-11T23:31:51,958][ERROR][o.e.x.m.d.DatafeedManager] Failed to remove datafeed persistent task - will not auto close job [ga-mean-bytes-p-status-clientip-status-1h]
java.lang.IllegalStateException: timed out after 20s
at org.elasticsearch.persistent.PersistentTasksService$WaitForPersistentTaskStatusListener.onTimeout(PersistentTasksService.java:199) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.persistent.PersistentTasksService$1.onTimeout(PersistentTasksService.java:164) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:576) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.3.0-SNAPSHOT.jar:6.3.0-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-04-11T23:32:07,180][INFO ][o.e.x.m.j.p.DataCountsReporter] [node2] [ga-sum-bytes-p-status-clientip-status-1h] 900000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-04-11T23:32:34,798][INFO ][o.e.x.m.j.p.DataCountsReporter] [node2] [ga-sum-bytes-p-status-clientip-status-1h] 1000000 records written to autodetect; missingFieldCount=0, invalidDateCount=0, outOfOrderCount=0
[2018-04-11T23:32:44,739][INFO ][o.e.x.m.d.DatafeedJob ] [ga-sum-bytes-p-status-clientip-status-1h] Lookback has finished
[2018-04-11T23:32:44,739][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] attempt to stop datafeed [datafeed-ga-sum-bytes-p-status-clientip-status-1h] for job [ga-sum-bytes-p-status-clientip-status-1h]
[2018-04-11T23:32:44,739][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] try lock [20s] to stop datafeed [datafeed-ga-sum-bytes-p-status-clientip-status-1h] for job [ga-sum-bytes-p-status-clientip-status-1h]...
[2018-04-11T23:32:44,739][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] stopping datafeed [datafeed-ga-sum-bytes-p-status-clientip-status-1h] for job [ga-sum-bytes-p-status-clientip-status-1h], acquired [true]...
[2018-04-11T23:32:44,739][INFO ][o.e.x.m.d.DatafeedManager] [no_realtime] datafeed [datafeed-ga-sum-bytes-p-status-clientip-status-1h] for job [ga-sum-bytes-p-status-clientip-status-1h] has been stopped
[2018-04-11T23:33:04,477][INFO ][o.e.x.m.j.p.a.AutodetectProcessManager] [node2] Closing job [ga-sum-bytes-p-status-clientip-status-1h], because [close job (api)]
[2018-04-11T23:33:04,488][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [ga-sum-bytes-p-status-clientip-status-1h] [autodetect/10435] EMAIL REDACTED Handled 1023038 records
[2018-04-11T23:33:04,488][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [ga-sum-bytes-p-status-clientip-status-1h] [autodetect/10435] EMAIL REDACTED Pruning all models
[2018-04-11T23:33:04,589][INFO ][o.e.x.m.j.p.a.NativeAutodetectProcess] [ga-sum-bytes-p-status-clientip-status-1h] State output finished
[2018-04-11T23:33:04,897][INFO ][o.e.x.m.j.p.a.o.AutoDetectResultProcessor] [ga-sum-bytes-p-status-clientip-status-1h] 7259 buckets parsed from autodetect output
[2018-04-11T23:33:09,161][INFO ][o.e.x.m.j.p.a.AutodetectCommunicator] [ga-sum-bytes-p-status-clientip-status-1h] job closed
The Job messages looked fine, although it was missing the final Job is closing message.
