Skip to content

Conversation

@naved001
Copy link
Collaborator

This refactors merge.py a bit, by moving some things into functions. Additionally it adds some basic tests and this time I switched to using pytest.

I ended up working on this because I realized I was adding more stuff in https://github.com/CCI-MOC/openshift-usage-scripts/pull/164/files and there were no tests.

so it's easier to read what's going on
Add tests for some of the important functions in merge.py.

Since I wrote the tests for pytest, I switched to using pytest as the
runner for the rest of the tests.
Copy link
Collaborator

@QuanMPhm QuanMPhm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few small questions before I approve

Comment on lines +111 to +112
if cluster_name is None:
cluster_name = metrics_from_file.get("cluster_name")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern that cluster_name is not being checked? What if the provided files are from different clusters. It seems this behavior has been in the code prior to this refactoring, but wanted to ask just in case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we checked that cluster_name could be different in different files, so this behavior is unchanged. But it doesn't hurt to add that additional check. I'll add that in a different PR.

gpu_a100sxm4=rates_data.get_value_at(
cpu=rates_data.get_value_at("CPU SU Rate", report_month, Decimal), # type: ignore
gpu_a100=rates_data.get_value_at("GPUA100 SU Rate", report_month, Decimal), # type: ignore
gpu_a100sxm4=rates_data.get_value_at( # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using a linter additional to the ruff that we use in the CI? I didn't got any pre-commit errors when removing these comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was vscode yelling at me so I put these, but I am going to remove these.

Comment on lines +87 to +95
with open(file, "r") as jsonfile:
metrics_from_file = json.load(jsonfile)
cpu_request_metrics = metrics_from_file["cpu_metrics"]
memory_request_metrics = metrics_from_file["memory_metrics"]
gpu_request_metrics = metrics_from_file.get("gpu_metrics", None)
processor.merge_metrics("cpu_request", cpu_request_metrics)
processor.merge_metrics("memory_request", memory_request_metrics)
if gpu_request_metrics is not None:
processor.merge_metrics("gpu_request", gpu_request_metrics)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion, but I think this is more concise:

Suggested change
with open(file, "r") as jsonfile:
metrics_from_file = json.load(jsonfile)
cpu_request_metrics = metrics_from_file["cpu_metrics"]
memory_request_metrics = metrics_from_file["memory_metrics"]
gpu_request_metrics = metrics_from_file.get("gpu_metrics", None)
processor.merge_metrics("cpu_request", cpu_request_metrics)
processor.merge_metrics("memory_request", memory_request_metrics)
if gpu_request_metrics is not None:
processor.merge_metrics("gpu_request", gpu_request_metrics)
for resource in ["cpu_metrics", "memory_metrics", "gpu_metrics"]:
if resource == "gpu_metrics":
if gpu_request_metrics := metrics_from_file.get(resource):
processor.merge_metrics(resource, gpu_request_metrics)
else:
request_metrics = metrics_from_file[resource]
processor.merge_metrics(resource, request_metrics)

If cpu_metrics and memory_metrics is always present, is it fine to make the loop even simpler?

for resource in ["cpu_metrics", "memory_metrics", "gpu_metrics"]:
    if request_metrics := metrics_from_file.get(resource):
        processor.merge_metrics(resource, request_metrics)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, while this is a good suggestion, due to the old clumsy naming of things it'll break stuff. See, the files put things in cpu_metrics but the processors call it cpu_request so our neat little loop won't work. I am going to leave this as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, the strings looked similar and I thought they were the same

return processor


def load_metadata(files: List[str]) -> MetricsMetadata:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could load data and metadata in a single loop instead of loading files twice. And for that I reason I don't like what I've done. I am going to refactor it again later.

@naved001 naved001 marked this pull request as draft November 19, 2025 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants