Reward balancing feature support #15

thetwotravelers · 2021-10-21T20:00:02Z

To use "reward balancing" we need user input reward term weights. And whether the user wants this balancing to leverage auto-normalization of reward term signals, then this should be indicated. I think it's clean if they are optional arguments to simulation.train(). @maxpumperla what do you think?

If we test this more, we may reduce the options and always use auto-normalization of the signals if they seek to use the reward balancing feature (i.e. they provide alphas).

Changes must also be made in nativerl to accommodate pathmind simulation reward balancing. Currently are there is a first attempt on bg_nb of this PathmindAI/nativerl#437 (see recent commits), which assumes the approval of this immediate PR or similar.

maxpumperla · 2021-10-21T20:35:23Z

hey @brettskymind this is cool!

I guess it ties back into our discussion about metrics vs reward terms. If I have to specify reward_weights here, but I only define the reward terms in the web app later on, is that good design? I'd have to know the number of terms in advance etc. Or do we for this PR assume that the values that come out of get_reward() of a simulation is what gets weighted? It's still unclear to me how we intend to handle this. I'd rather have reward terms and weightings defined together in the web app, or all together in Python.

thetwotravelers · 2021-10-21T20:49:06Z

@maxpumperla

Or do we for this PR assume that the values that come out of get_reward() of a simulation is what gets weighted?

Yes. That’s my intention here, you can see it reflected in the latest commits on that nativerl PR.

It's still unclear to me how we intend to handle this. I'd rather have reward terms and weightings defined together in the web app, or all together in Python.

I agree they should be able to define them together in Python (and maybe override in webapp). Here I assumed users just call simulation.train() and never enter a webapp UI.

Currently, can users access the reward terms UI when uploading Pathmind Simulations? If they go that route, maybe we can augment the dict returned by get_reward() with the newly defined terms, append corresponding alphas, and set original alphas to zero.

maxpumperla · 2021-10-21T20:58:10Z

ok, got it. that makes sense to me then.

Currently, can users access the reward terms UI when uploading Pathmind Simulations? If they go that route, maybe we can augment the dict returned by get_reward() with the newly defined terms, append corresponding alphas, and set original alphas to zero.

I don't think they can, as of now. But I was thinking of our discussion on Tuesday where we argued that reward terms, e.g. with before and after, have proven to be useful again and again. So I was just wondering if it made sense to have advanced features like setting alphas to the Python side of things. Do we e.g. plan to set alphas in Anylogic as well? I'd be great to keep everything as consistent as possible.

pathmind/simulation.py

thetwotravelers · 2021-10-21T21:17:50Z

@maxpumperla
Part of this push to have them defined via Pathmind Simulation is to get a Python test model I can use to validate the above NativeRL PR. Maybe I shouldn’t change the Pathmind-api just to validate new features. Should we instead deploy that to test and validate there? Slin was hoping for a python simulation test. That leads me here.

So there are already some differences with AnyLogic simulations, for example they don’t have a train() method. I was imagining the webapp would, update get_reward() if the user changes the default terms, then call train() while feeding in the webapp-UI-gathered alphas. If they never defined alphas before the webapp, like the AL users, that’d be fine.

I thought we’d we keep the basic requirements in parallel with AL model helper inputs, but extend to optional advanced features without necessitating the Python user use a webapp UI to activate them. This returns to the question of where we want the user workflow to end before they hit “train”. Do Python users just want a web-based summary of experiments they kicked off from the command line? Or do they want to upload a model and construct experiments, iteratively, as AL users do? And do we want the option for both?

maxpumperla · 2021-10-22T05:29:32Z

@brettskymind those are all really good points you're raising, and frankly I don't know what users will want. But given that we're changing the interface here just minimally (hopefully just 1-2 optional args), I'd be happy to have this change in. Especially if it helps you move along quicker with your own development. Our own DevX is important, too.

To be clear, I'm pro doing things in Python personally (and obs.yaml is another example of where things differ). I just don't want us to have conceptual opacity. Like, what really is a "reward term" and how does it differ from what I define here? Why does the web app have these magical before and after and how do they relate to my simulation? And how do we explain the difference between "metrics" and "reward terms" to our (Python) users so that they're most effective?

I know these are quite broad questions, and they only tangentially relate to your PR, but we have to discuss this somewhere. This all might sound minuscule, but such design decisions tend to have a massive impact. If our interface isn't clear, resp. our concepts don't add up to users, we might lose them in onboarding. So let's be clear about our wording and how we communicate things, too.

In any case, let's go ahead with this PR! I'm looking forward to see an example of it in action. :D

…into bg_alphas

slinlee · 2021-11-02T00:26:52Z

@maxpumperla @brettskymind I started a related discussion here PathmindAI/pathmind-webapp#3678 (comment)

auto norm option

0dd088b

thetwotravelers requested a review from maxpumperla October 21, 2021 20:00

thetwotravelers assigned maxpumperla and thetwotravelers Oct 21, 2021

two reward example

a769012

maxpumperla reviewed Oct 21, 2021

View reviewed changes

pathmind/simulation.py Outdated Show resolved Hide resolved

maxpumperla mentioned this pull request Oct 23, 2021

Enable training auto-start #22

Open

brettskymind added 2 commits October 25, 2021 12:20

Optional

05bc030

update mouses

c21c014

thetwotravelers marked this pull request as ready for review October 25, 2021 23:31

brettskymind and others added 6 commits October 25, 2021 16:43

readme

ecc97ce

mouse

53c8374

Merge branch 'main' into bg_alphas

6901d1e

update example

cb231de

Merge branch 'bg_alphas' of https://github.com/PathmindAI/pathmind-api …

358e78d

…into bg_alphas

format

ab8b78c

thetwotravelers mentioned this pull request Oct 26, 2021

Add support for pathmind simulations and auto-norm PathmindAI/nativerl#424

Closed

slinlee requested a review from maxpumperla October 26, 2021 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reward balancing feature support #15

Reward balancing feature support #15

Uh oh!

thetwotravelers commented Oct 21, 2021

Uh oh!

maxpumperla commented Oct 21, 2021

Uh oh!

thetwotravelers commented Oct 21, 2021

Uh oh!

maxpumperla commented Oct 21, 2021

Uh oh!

Uh oh!

thetwotravelers commented Oct 21, 2021 •

edited

Loading

Uh oh!

maxpumperla commented Oct 22, 2021

Uh oh!

slinlee commented Nov 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reward balancing feature support #15

Are you sure you want to change the base?

Reward balancing feature support #15

Uh oh!

Conversation

thetwotravelers commented Oct 21, 2021

Uh oh!

maxpumperla commented Oct 21, 2021

Uh oh!

thetwotravelers commented Oct 21, 2021

Uh oh!

maxpumperla commented Oct 21, 2021

Uh oh!

Uh oh!

thetwotravelers commented Oct 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxpumperla commented Oct 22, 2021

Uh oh!

slinlee commented Nov 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

thetwotravelers commented Oct 21, 2021 •

edited

Loading