Skip to content

Conversation

thetwotravelers
Copy link

To use "reward balancing" we need user input reward term weights. And whether the user wants this balancing to leverage auto-normalization of reward term signals, then this should be indicated. I think it's clean if they are optional arguments to simulation.train(). @maxpumperla what do you think?

If we test this more, we may reduce the options and always use auto-normalization of the signals if they seek to use the reward balancing feature (i.e. they provide alphas).

Changes must also be made in nativerl to accommodate pathmind simulation reward balancing. Currently are there is a first attempt on bg_nb of this PathmindAI/nativerl#437 (see recent commits), which assumes the approval of this immediate PR or similar.

@maxpumperla
Copy link
Contributor

hey @brettskymind this is cool!

I guess it ties back into our discussion about metrics vs reward terms. If I have to specify reward_weights here, but I only define the reward terms in the web app later on, is that good design? I'd have to know the number of terms in advance etc. Or do we for this PR assume that the values that come out of get_reward() of a simulation is what gets weighted? It's still unclear to me how we intend to handle this. I'd rather have reward terms and weightings defined together in the web app, or all together in Python.

@thetwotravelers
Copy link
Author

@maxpumperla

Or do we for this PR assume that the values that come out of get_reward() of a simulation is what gets weighted?

Yes. That’s my intention here, you can see it reflected in the latest commits on that nativerl PR.

It's still unclear to me how we intend to handle this. I'd rather have reward terms and weightings defined together in the web app, or all together in Python.

I agree they should be able to define them together in Python (and maybe override in webapp). Here I assumed users just call simulation.train() and never enter a webapp UI.

Currently, can users access the reward terms UI when uploading Pathmind Simulations? If they go that route, maybe we can augment the dict returned by get_reward() with the newly defined terms, append corresponding alphas, and set original alphas to zero.

@maxpumperla
Copy link
Contributor

ok, got it. that makes sense to me then.

Currently, can users access the reward terms UI when uploading Pathmind Simulations? If they go that route, maybe we can augment the dict returned by get_reward() with the newly defined terms, append corresponding alphas, and set original alphas to zero.

I don't think they can, as of now. But I was thinking of our discussion on Tuesday where we argued that reward terms, e.g. with before and after, have proven to be useful again and again. So I was just wondering if it made sense to have advanced features like setting alphas to the Python side of things. Do we e.g. plan to set alphas in Anylogic as well? I'd be great to keep everything as consistent as possible.

@thetwotravelers
Copy link
Author

thetwotravelers commented Oct 21, 2021

@maxpumperla
Part of this push to have them defined via Pathmind Simulation is to get a Python test model I can use to validate the above NativeRL PR. Maybe I shouldn’t change the Pathmind-api just to validate new features. Should we instead deploy that to test and validate there? Slin was hoping for a python simulation test. That leads me here.

So there are already some differences with AnyLogic simulations, for example they don’t have a train() method. I was imagining the webapp would, update get_reward() if the user changes the default terms, then call train() while feeding in the webapp-UI-gathered alphas. If they never defined alphas before the webapp, like the AL users, that’d be fine.

I thought we’d we keep the basic requirements in parallel with AL model helper inputs, but extend to optional advanced features without necessitating the Python user use a webapp UI to activate them. This returns to the question of where we want the user workflow to end before they hit “train”. Do Python users just want a web-based summary of experiments they kicked off from the command line? Or do they want to upload a model and construct experiments, iteratively, as AL users do? And do we want the option for both?

@maxpumperla
Copy link
Contributor

@brettskymind those are all really good points you're raising, and frankly I don't know what users will want. But given that we're changing the interface here just minimally (hopefully just 1-2 optional args), I'd be happy to have this change in. Especially if it helps you move along quicker with your own development. Our own DevX is important, too.

To be clear, I'm pro doing things in Python personally (and obs.yaml is another example of where things differ). I just don't want us to have conceptual opacity. Like, what really is a "reward term" and how does it differ from what I define here? Why does the web app have these magical before and after and how do they relate to my simulation? And how do we explain the difference between "metrics" and "reward terms" to our (Python) users so that they're most effective?

I know these are quite broad questions, and they only tangentially relate to your PR, but we have to discuss this somewhere. This all might sound minuscule, but such design decisions tend to have a massive impact. If our interface isn't clear, resp. our concepts don't add up to users, we might lose them in onboarding. So let's be clear about our wording and how we communicate things, too.

In any case, let's go ahead with this PR! I'm looking forward to see an example of it in action. :D

@thetwotravelers thetwotravelers marked this pull request as ready for review October 25, 2021 23:31
@slinlee
Copy link
Contributor

slinlee commented Nov 2, 2021

@maxpumperla @brettskymind I started a related discussion here PathmindAI/pathmind-webapp#3678 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants