-
Notifications
You must be signed in to change notification settings - Fork 185
Description
Using jdbc plugin 4.3.13
Using logstash in docker: docker.elastic.co/logstash/logstash:7.1.1
In theory clean_run setting of the jdbc plugin decides:
Whether the previous run state should be preserved
(https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-clean_run)
Again, in theory if we set clean_run: true, the sql_last_value should become '1970-01-01 00:00:00'
If clean_run is set to true, this value (sql_last_value) will be ignored and sql_last_value will be set to Jan 1, 1970
(https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_state)
However, in practice, the clean_run: true in the example below still causes the pipeline to keep the last run state under the :sql_last_value parameter (the date of last run in our case).
:sql_last_value is set to '1970-01-01 00:00:00' only when the pipeline starts for the first time.
2nd and next runs don't start from '1970-01-01 00:00:00', the sql query still contains value of the last run.
Logs:
1st run is OK:
app_1 | [2020-02-19T11:33:01,735][WARN ][logstash.inputs.jdbc ] (0.103191s)
app_1 | SELECT
app_1 | rg.id,
app_1 | rg.name,
app_1 | r.name AS retailer_name,
app_1 | r.trkref
app_1 | FROM retailer_groups rg
app_1 | JOIN retailers r ON rg.id = r.retailer_group_id
app_1 | WHERE r.review_mark_enabled = 1 and rg.updated_at >= '1970-01-01 00:00:00'
2nd run NOT ok:
app_1 | [2020-02-19T11:34:01,038][WARN ][logstash.inputs.jdbc ] (0.049631s)
app_1 | SELECT
app_1 | rg.id,
app_1 | rg.name,
app_1 | r.name AS retailer_name,
app_1 | r.trkref
app_1 | FROM retailer_groups rg
app_1 | JOIN retailers r ON rg.id = r.retailer_group_id
app_1 | WHERE r.review_mark_enabled = 1 and rg.updated_at >= '2020-02-19 11:33:01'
3rd run and any later on NOT ok:
app_1 | [2020-02-19T11:35:01,072][WARN ][logstash.inputs.jdbc ] (0.049421s)
app_1 | SELECT
app_1 | rg.id,
app_1 | rg.name,
app_1 | r.name AS retailer_name,
app_1 | r.trkref
app_1 | FROM retailer_groups rg
app_1 | JOIN retailers r ON rg.id = r.retailer_group_id
app_1 | WHERE r.review_mark_enabled = 1 and rg.updated_at >= '2020-02-19 11:34:00'
You can reproduce the behaviour with this input:
input{
jdbc {
jdbc_driver_library => "${MYSQL_DRIVER}"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "..."
jdbc_user => "..."
jdbc_password => "..."
schedule => "*/1 * * * *"
clean_run => true
statement => "
SELECT
rg.id,
rg.name,
r.name AS retailer_name,
r.trkref
FROM retailer_groups rg
JOIN retailers r ON rg.id = r.retailer_group_id
WHERE r.review_mark_enabled = 1 and rg.updated_at >= :sql_last_value
"
}
}
In order to start the pipeline from "0" ('1970-01-01 00:00:00'), we need to redeploy the logstash.
Could anyone explain this to me? Either I miss some understanding or the plugin doesn't work as expected - a bug.
From my understanding the "clean_run: true" should cause the :sql_last_value param to be "zero" each time the pipeline starts, in the example above it keeps the state...