Stella pointed out to how they do consistency calculations/checks with NeoX:
https://github.com/EleutherAI/gpt-neox/blob/main/megatron/neox_arguments/arguments.py
It'd be good for someone to study what they did over the base Megatron-LM and replicate anything that can help our work, since some good checks can save days of running a model under a wrong setup thinking it's doing something else.
I haven't studied what they did, so I don't have any specific suggestions here.
Thank you.