-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Labels
IndexingIndexing, Bulk Indexing and anything related to indexingIndexing, Bulk Indexing and anything related to indexingIndexing:PerformanceOtherenhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or requestgood first issueGood for newcomersGood for newcomerslucene
Description
Is your feature request related to a problem? Please describe
It seems impossible to get statistics such as disk consumption of individual lucene segment files per field. We do have a similar API which can aggregate statistics at shard, index and cluster level using index stats API (https://opensearch.org/docs/latest/api-reference/index-apis/stats/). Like -
/<index_name>/_stats/segments?level=shards&include_segment_file_sizes&pretty" "file_sizes" : {
"nvm" : {
"size_in_bytes" : 5486,
"description" : "Norms"
},
"fnm" : {
"size_in_bytes" : 127478,
"description" : "Fields"
},
"kdd" : {
"size_in_bytes" : 1119640726,
"description" : "Others"
},
"tmd" : {
"size_in_bytes" : 59232,
"description" : "Others"
},
"fdm" : {
"size_in_bytes" : 9175,
"description" : "Others"
},
"kdi" : {
"size_in_bytes" : 2926068,
"description" : "Others"
},
"dvd" : {
"size_in_bytes" : 1687766934,
"description" : "DocValues"
},
"kdm" : {
"size_in_bytes" : 10398,
"description" : "Others"
},
"pos" : {
"size_in_bytes" : 6051226,
"description" : "Positions"
},
"si" : {
"size_in_bytes" : 1176,
"description" : "Segment Info"
},
"fdt" : {
"size_in_bytes" : 9388206796,
"description" : "Field Data"
},
"doc" : {
"size_in_bytes" : 2964755700,
"description" : "Frequencies"
},
"tim" : {
"size_in_bytes" : 483838256,
"description" : "Term Dictionary"
},
"dvm" : {
"size_in_bytes" : 117015,
"description" : "DocValues"
},
"tip" : {
"size_in_bytes" : 20778232,
"description" : "Term Index"
},
"fdx" : {
"size_in_bytes" : 519052,
"description" : "Field Index"
},
"nvd" : {
"size_in_bytes" : 1534,
"description" : "Norms"
}
}Describe the solution you'd like
Introduce similar API for query param to existing index stats API to provide this information at field level which can be aggregated at shard, index and cluster level per field.
This would be useful in understanding usage statistics per field. There is no way other than writing script to read lucene indexes and compute this information.
Related component
Other
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
IndexingIndexing, Bulk Indexing and anything related to indexingIndexing, Bulk Indexing and anything related to indexingIndexing:PerformanceOtherenhancementEnhancement or improvement to existing feature or requestEnhancement or improvement to existing feature or requestgood first issueGood for newcomersGood for newcomerslucene
Type
Projects
Status
Todo