Skip to content

[Feature Request] Field level statistics of lucene index files #12113

@rishabhmaurya

Description

@rishabhmaurya

Is your feature request related to a problem? Please describe

It seems impossible to get statistics such as disk consumption of individual lucene segment files per field. We do have a similar API which can aggregate statistics at shard, index and cluster level using index stats API (https://opensearch.org/docs/latest/api-reference/index-apis/stats/). Like -

/<index_name>/_stats/segments?level=shards&include_segment_file_sizes&pretty"
  "file_sizes" : {
    "nvm" : {
      "size_in_bytes" : 5486,
      "description" : "Norms"
    },
    "fnm" : {
      "size_in_bytes" : 127478,
      "description" : "Fields"
    },
    "kdd" : {
      "size_in_bytes" : 1119640726,
      "description" : "Others"
    },
    "tmd" : {
      "size_in_bytes" : 59232,
      "description" : "Others"
    },
    "fdm" : {
      "size_in_bytes" : 9175,
      "description" : "Others"
    },
    "kdi" : {
      "size_in_bytes" : 2926068,
      "description" : "Others"
    },
    "dvd" : {
      "size_in_bytes" : 1687766934,
      "description" : "DocValues"
    },
    "kdm" : {
      "size_in_bytes" : 10398,
      "description" : "Others"
    },
    "pos" : {
      "size_in_bytes" : 6051226,
      "description" : "Positions"
    },
    "si" : {
      "size_in_bytes" : 1176,
      "description" : "Segment Info"
    },
    "fdt" : {
      "size_in_bytes" : 9388206796,
      "description" : "Field Data"
    },
    "doc" : {
      "size_in_bytes" : 2964755700,
      "description" : "Frequencies"
    },
    "tim" : {
      "size_in_bytes" : 483838256,
      "description" : "Term Dictionary"
    },
    "dvm" : {
      "size_in_bytes" : 117015,
      "description" : "DocValues"
    },
    "tip" : {
      "size_in_bytes" : 20778232,
      "description" : "Term Index"
    },
    "fdx" : {
      "size_in_bytes" : 519052,
      "description" : "Field Index"
    },
    "nvd" : {
      "size_in_bytes" : 1534,
      "description" : "Norms"
    }
  }

Describe the solution you'd like

Introduce similar API for query param to existing index stats API to provide this information at field level which can be aggregated at shard, index and cluster level per field.
This would be useful in understanding usage statistics per field. There is no way other than writing script to read lucene indexes and compute this information.

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions