Skip to content

Conversation

@clandry94
Copy link

@clandry94 clandry94 commented Feb 8, 2019

An implementation from this issue #36712.

Right now, the stats API only provides refresh metrics regarding internal refreshes. This isn't very useful and somewhat misleading for cluster administrators since internal refreshes are not indicative of documents being available for search..

In this PR I added a new metric for collecting external refreshes as they occur and exposing them through the stats API. Now, calling an endpoint for stats will yield external refresh metrics as well.

GET /test/_stats/refresh
{
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_all" : {
    "primaries" : {
      "refresh" : {
        "total" : 7,
        "total_time_in_millis" : 157,
        "external_total" : 5,
        "external_total_time_in_millis" : 157,
        "listeners" : 0
      }
    },
    "total" : {
      "refresh" : {
        "total" : 7,
        "total_time_in_millis" : 157,
        "external_total" : 5,
        "external_total_time_in_millis" : 157,
        "listeners" : 0
      }
    }
  },
  "indices" : {
    "test" : {
      "uuid" : "4UxNKfIjSCGvGrBEU1l7tg",
      "primaries" : {
        "refresh" : {
          "total" : 7,
          "total_time_in_millis" : 157,
          "external_total" : 5,
          "external_total_time_in_millis" : 157,
          "listeners" : 0
        }
      },
      "total" : {
        "refresh" : {
          "total" : 7,
          "total_time_in_millis" : 157,
          "external_total" : 5,
          "external_total_time_in_millis" : 157,
          "listeners" : 0
        }
      }
    }
  }
}

Also, how is the naming? I'm not sure external_total and external_total_time_in_millis is the best way to express this, but it gets the point across. Also, should total and total_time_in_millis be changed to internal_total and so on?

cc @s1monw

@matriv matriv added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Feb 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM left some nit-picks. I am sorry for the delay on this. This looks pretty good, thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra newline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra newline

@s1monw
Copy link
Contributor

s1monw commented Feb 20, 2019

@elasticmachine ok to test

@s1monw s1monw self-assigned this Feb 20, 2019
@s1monw
Copy link
Contributor

s1monw commented Feb 20, 2019

@clandry94 can you merge your branch up to current master please

@clandry94 clandry94 force-pushed the external_refresh_metrics branch from 7e4f177 to 82695e9 Compare February 20, 2019 13:51
@s1monw
Copy link
Contributor

s1monw commented Feb 20, 2019

@clandry94 there are some test failures, would you mind taking a look?

@clandry94
Copy link
Author

Sure, I'll resolve them

@clandry94 clandry94 force-pushed the external_refresh_metrics branch from 82695e9 to f70f6cc Compare February 21, 2019 12:24
@s1monw
Copy link
Contributor

s1monw commented Feb 21, 2019

@clandry94 you don't have to forcepush for all changes it's easier to review if you don't. you can still just do a git merge master if you wanna catch up. we will do a squash and merge once we are ready here

@clandry94
Copy link
Author

@s1monw in the elasticsearch-ci/2 IndexStatsMonitoringDocTests.testToXContent is failing, but for reasons that seem unrelated to the changes I made. Specifically,
expected:
{"query_total":18,"query_time_in_millis":19}
and actual:
{"query_total":19,"query_time_in_millis":20}

The actual values seem to be consistent over multiple tests, so should the expected values be changed to match that?

@s1monw
Copy link
Contributor

s1monw commented Feb 28, 2019

Sorry for the late reply.

The actual values seem to be consistent over multiple tests, so should the expected values be changed to match that?

I think this change is the reason for this off-by-one: https://github.com/elastic/elasticsearch/pull/38643/files#diff-07c9dc27cfd5ff0ef6a21088ddf8b8d6R327

@s1monw
Copy link
Contributor

s1monw commented Feb 28, 2019

you also need to merge with master since we updated to a new and incompatible lucene snapshot.

@clandry94 clandry94 force-pushed the external_refresh_metrics branch from 1fb6074 to 95f1751 Compare March 7, 2019 18:37
@dnhatn
Copy link
Member

dnhatn commented Mar 15, 2019

@clandry94 I tried to get this ready, but I did not have your permission. Could you please merge the master branch to your branch and apply this patch 38643.txt? We need to handle BWC. Thank you!

@clandry94
Copy link
Author

Hey @dnhatn, merged master and applied your patch. Thanks for the help 😁

@dnhatn
Copy link
Member

dnhatn commented Mar 16, 2019

run elasticsearch-ci/packaging-sample

@dnhatn
Copy link
Member

dnhatn commented Mar 17, 2019

Hey @clandry94, can you please merge master into your branch? BWC was disabled in the revision that you merged. Thank you!

@dnhatn
Copy link
Member

dnhatn commented Mar 18, 2019

Thanks @clandry94 for working on this.

@dnhatn dnhatn merged commit 4d73485 into elastic:master Mar 18, 2019
dnhatn added a commit that referenced this pull request Mar 19, 2019
Having these new fields to the basic cat shards yaml test does not add
much while they are causing a BWC issue (i.e., we need to skip this yaml
test until 8.0).

Relates #38643
dnhatn added a commit that referenced this pull request Mar 19, 2019
BWC is failing with this change.
This reverts commit 4d73485.
dnhatn pushed a commit to dnhatn/elasticsearch that referenced this pull request Mar 23, 2019
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates elastic#36712
dnhatn added a commit that referenced this pull request Mar 25, 2019
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates #36712
dnhatn pushed a commit that referenced this pull request Mar 25, 2019
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates #36712
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v7.2.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants