Skip to content

clean_run: true does not work for sql statement #373

@palin

Description

@palin

Using jdbc plugin 4.3.13
Using logstash in docker: docker.elastic.co/logstash/logstash:7.1.1

In theory clean_run setting of the jdbc plugin decides:

Whether the previous run state should be preserved
(https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#plugins-inputs-jdbc-clean_run)

Again, in theory if we set clean_run: true, the sql_last_value should become '1970-01-01 00:00:00'

If clean_run is set to true, this value (sql_last_value) will be ignored and sql_last_value will be set to Jan 1, 1970
(https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_state)

However, in practice, the clean_run: true in the example below still causes the pipeline to keep the last run state under the :sql_last_value parameter (the date of last run in our case).

:sql_last_value is set to '1970-01-01 00:00:00' only when the pipeline starts for the first time.
2nd and next runs don't start from '1970-01-01 00:00:00', the sql query still contains value of the last run.

Logs:

1st run is OK:

app_1 | [2020-02-19T11:33:01,735][WARN ][logstash.inputs.jdbc     ] (0.103191s)
app_1 |       SELECT
app_1 |         rg.id,
app_1 |         rg.name,
app_1 |         r.name AS retailer_name,
app_1 |         r.trkref
app_1 |       FROM retailer_groups rg
app_1 |       JOIN retailers r ON rg.id = r.retailer_group_id
app_1 |       WHERE r.review_mark_enabled = 1 and rg.updated_at >= '1970-01-01 00:00:00'

2nd run NOT ok:

app_1 | [2020-02-19T11:34:01,038][WARN ][logstash.inputs.jdbc     ] (0.049631s)
app_1 |       SELECT
app_1 |         rg.id,
app_1 |         rg.name,
app_1 |         r.name AS retailer_name,
app_1 |         r.trkref
app_1 |       FROM retailer_groups rg
app_1 |       JOIN retailers r ON rg.id = r.retailer_group_id
app_1 |       WHERE r.review_mark_enabled = 1 and rg.updated_at >= '2020-02-19 11:33:01'

3rd run and any later on NOT ok:

app_1 | [2020-02-19T11:35:01,072][WARN ][logstash.inputs.jdbc     ] (0.049421s)
app_1 |       SELECT
app_1 |         rg.id,
app_1 |         rg.name,
app_1 |         r.name AS retailer_name,
app_1 |         r.trkref
app_1 |       FROM retailer_groups rg
app_1 |       JOIN retailers r ON rg.id = r.retailer_group_id
app_1 |       WHERE r.review_mark_enabled = 1 and rg.updated_at >= '2020-02-19 11:34:00'

You can reproduce the behaviour with this input:

input{
  jdbc {
    jdbc_driver_library => "${MYSQL_DRIVER}"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "..."
    jdbc_user => "..."
    jdbc_password => "..."
    schedule => "*/1 * * * *"
    clean_run => true
    statement => "
      SELECT
        rg.id,
        rg.name,
        r.name AS retailer_name,
        r.trkref
      FROM retailer_groups rg
      JOIN retailers r ON rg.id = r.retailer_group_id
      WHERE r.review_mark_enabled = 1 and rg.updated_at >= :sql_last_value
    "
  }
}

In order to start the pipeline from "0" ('1970-01-01 00:00:00'), we need to redeploy the logstash.

Could anyone explain this to me? Either I miss some understanding or the plugin doesn't work as expected - a bug.
From my understanding the "clean_run: true" should cause the :sql_last_value param to be "zero" each time the pipeline starts, in the example above it keeps the state...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions