-
Notifications
You must be signed in to change notification settings - Fork 1
Reward balancing feature support #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
hey @brettskymind this is cool! I guess it ties back into our discussion about metrics vs reward terms. If I have to specify |
Yes. That’s my intention here, you can see it reflected in the latest commits on that nativerl PR.
I agree they should be able to define them together in Python (and maybe override in webapp). Here I assumed users just call simulation.train() and never enter a webapp UI. Currently, can users access the reward terms UI when uploading Pathmind Simulations? If they go that route, maybe we can augment the dict returned by get_reward() with the newly defined terms, append corresponding alphas, and set original alphas to zero. |
ok, got it. that makes sense to me then.
I don't think they can, as of now. But I was thinking of our discussion on Tuesday where we argued that reward terms, e.g. with before and after, have proven to be useful again and again. So I was just wondering if it made sense to have advanced features like setting alphas to the Python side of things. Do we e.g. plan to set alphas in Anylogic as well? I'd be great to keep everything as consistent as possible. |
@maxpumperla So there are already some differences with AnyLogic simulations, for example they don’t have a train() method. I was imagining the webapp would, update get_reward() if the user changes the default terms, then call train() while feeding in the webapp-UI-gathered alphas. If they never defined alphas before the webapp, like the AL users, that’d be fine. I thought we’d we keep the basic requirements in parallel with AL model helper inputs, but extend to optional advanced features without necessitating the Python user use a webapp UI to activate them. This returns to the question of where we want the user workflow to end before they hit “train”. Do Python users just want a web-based summary of experiments they kicked off from the command line? Or do they want to upload a model and construct experiments, iteratively, as AL users do? And do we want the option for both? |
@brettskymind those are all really good points you're raising, and frankly I don't know what users will want. But given that we're changing the interface here just minimally (hopefully just 1-2 optional args), I'd be happy to have this change in. Especially if it helps you move along quicker with your own development. Our own DevX is important, too. To be clear, I'm pro doing things in Python personally (and I know these are quite broad questions, and they only tangentially relate to your PR, but we have to discuss this somewhere. This all might sound minuscule, but such design decisions tend to have a massive impact. If our interface isn't clear, resp. our concepts don't add up to users, we might lose them in onboarding. So let's be clear about our wording and how we communicate things, too. In any case, let's go ahead with this PR! I'm looking forward to see an example of it in action. :D |
@maxpumperla @brettskymind I started a related discussion here PathmindAI/pathmind-webapp#3678 (comment) |
To use "reward balancing" we need user input reward term weights. And whether the user wants this balancing to leverage auto-normalization of reward term signals, then this should be indicated. I think it's clean if they are optional arguments to
simulation.train()
. @maxpumperla what do you think?If we test this more, we may reduce the options and always use auto-normalization of the signals if they seek to use the reward balancing feature (i.e. they provide alphas).
Changes must also be made in
nativerl
to accommodate pathmind simulation reward balancing. Currently are there is a first attempt onbg_nb
of this PathmindAI/nativerl#437 (see recent commits), which assumes the approval of this immediate PR or similar.